Class MongoDbConnector

  • All Implemented Interfaces:
    org.apache.kafka.connect.components.Versioned

    public class MongoDbConnector
    extends org.apache.kafka.connect.source.SourceConnector
    A Kafka Connect source connector that creates tasks that replicate the context of one or more MongoDB replica sets.

    Sharded Clusters

    This connector is able to fully replicate the content of one sharded MongoDB 3.2 cluster. In this case, simply configure the connector with the host addresses of the configuration replica set. When the connector starts, it will discover and replicate the replica set for each shard.

    Replica Set

    The connector is able to fully replicate the content of one MongoDB 3.2 replica set. (Older MongoDB servers may be work but have not been tested.) In this case, simply configure the connector with the host addresses of the replica set. When the connector starts, it will discover the primary node and use it to replicate the contents of the replica set.

    If necessary, a configuration property can be used to disable the logic used to discover the primary node, an in this case the connector will use the first host address specified in the configuration as the primary node. Obviously this may cause problems when the replica set elects a different node as the primary, since the connector will continue to read the oplog using the same node that may no longer be the primary.

    Parallel Replication

    The connector will concurrently and independently replicate each of the replica sets. When the connector is asked to allocate tasks, it will attempt to allocate a separate task for each replica set. However, if the maximum number of tasks exceeds the number of replica sets, then some tasks may replicate multiple replica sets. Note that each task will use a separate thread to replicate each of its assigned replica sets.

    Initial Sync and Reading the Oplog

    When a connector begins to replicate a sharded cluster or replica set for the first time, it will perform an initial sync of the collections in the replica set by generating source records for each document in each collection. Only when this initial sync completes successfully will the replication then use the replica set's primary node to read the oplog and produce source records for each oplog event. The replication process records the position of each oplog event as an offset, so that upon restart the replication process can use the last recorded offset to determine where in the oplog it is to begin reading and processing events.

    Use of Topics

    The connector will write to a separate topic all of the source records that correspond to a single collection. The topic will be named "<logicalName>.<databaseName>.<collectionName>", where <logicalName> is set via the "mongodb.name" configuration property.

    Configuration

    This connector is configured with the set of properties described in MongoDbConnectorConfig.

    Author:
    Randall Hauch
    • Constructor Detail

      • MongoDbConnector

        public MongoDbConnector()
    • Method Detail

      • version

        public String version()
      • taskClass

        public Class<? extends org.apache.kafka.connect.connector.Task> taskClass()
        Specified by:
        taskClass in class org.apache.kafka.connect.connector.Connector
      • start

        public void start​(Map<String,​String> props)
        Specified by:
        start in class org.apache.kafka.connect.connector.Connector
      • replicaSetsChanged

        protected void replicaSetsChanged​(ReplicaSets replicaSets)
      • taskConfigs

        public List<Map<String,​String>> taskConfigs​(int maxTasks)
        Specified by:
        taskConfigs in class org.apache.kafka.connect.connector.Connector
      • stop

        public void stop()
        Specified by:
        stop in class org.apache.kafka.connect.connector.Connector
      • config

        public org.apache.kafka.common.config.ConfigDef config()
        Specified by:
        config in class org.apache.kafka.connect.connector.Connector
      • validate

        public org.apache.kafka.common.config.Config validate​(Map<String,​String> connectorConfigs)
        Overrides:
        validate in class org.apache.kafka.connect.connector.Connector