Starting with version 3.2, Apache Marmotta can be configured to run in a clustered or cloud environment. Instances in the cluster are managed using Apache Zookeeper, which needs to be installed separately. Also, clustered setup only works when using the KiWi triple store backend, and using a common database (or database cluster) for all servers in a cluster.
The following diagram gives an overview over a typical cluster setup of Apache Marmotta:
Setting up Zookeeper for Apache Marmotta is straightforward, as the Zookeeper module does most of the initialisation for you. To connect to a running Zookeeper server, the following configuration options need to be passed over to Marmotta, either as system properties or as servlet context parameters:
The main feature of the Zookeeper module is that Marmotta instances can be automatically configured through Zookeeper. Marmotta instances will react to configuration changes in Zookeeper and update their local configuration accordingly. The Apache Marmotta configuration stored in Zookeeper follows the following structure:
+ marmotta + config - global configuration options | + <config_key> <config_value> | + ... + clusters + default | + config - cluster-level configuration options | | + <config_key> <config_value> | | + ... | + snowflake - used for generating unique IDs | + instances | + <instance_name> | | + config - instance-level configuration options | | + <config_key> <config_value> | | + ... | + <instance_name2> | | + ... | + ... + <cluster1> + config | + ... + ...
Configuration values can be stored on either the global level, in which case the configuration applies to all Marmotta instances, the cluster level, in which case the configuration applies to only those Marmotta instances in a single cluster, or the instance level, in which case the configuration only applies to a specific Marmotta instance. More specific configurations take precedence over more generic configurations.
All servers in a cluster need to be configured to access the same database. Therefore, the following configuration properties should be defined on the cluster level in Zookeeper:
When initialising the cluster for the first time, it is advisable to first start up only a single Marmotta instance to let it allow setting up the necessary database tables. When the database initialisation is complete, all other instances cam be started up in any order. When starting up a new instance, the Zookeeper module will automatically create a proper datacenter ID for generating database IDs that are unique over the cluster.
When running in a high-load environment, it is also useful to startup the database in a database cluster. This is e.g. supported by PostgreSQL. The setup of a high-availability cluster is described in the PostgreSQL documentation.
For properly running Apache Marmotta in a cluster, the following additional configurations need to be considered: