Class MongoDbConnector

java.lang.Object
org.apache.kafka.connect.connector.Connector
org.apache.kafka.connect.source.SourceConnector
io.debezium.connector.mongodb.MongoDbConnector
All Implemented Interfaces:
org.apache.kafka.connect.components.Versioned

public class MongoDbConnector extends org.apache.kafka.connect.source.SourceConnector
A Kafka Connect source connector that creates tasks that replicate the context of one or more MongoDB replica sets.

Sharded Clusters

This connector is able to fully replicate the content of one sharded MongoDB 3.2 cluster. In this case, simply configure the connector with the host addresses of the configuration replica set. When the connector starts, it will discover and replicate the replica set for each shard.

Replica Set

The connector is able to fully replicate the content of one MongoDB 3.2 replica set. (Older MongoDB servers may be work but have not been tested.) In this case, simply configure the connector with the host addresses of the replica set. When the connector starts, it will discover the primary node and use it to replicate the contents of the replica set.

Parallel Replication

The connector will concurrently and independently replicate each of the replica sets. When the connector is asked to allocate tasks, it will attempt to allocate a separate task for each replica set. However, if the maximum number of tasks exceeds the number of replica sets, then some tasks may replicate multiple replica sets. Note that each task will use a separate thread to replicate each of its assigned replica sets.

Initial Sync and Reading the Oplog

When a connector begins to replicate a sharded cluster or replica set for the first time, it will perform an initial sync of the collections in the replica set by generating source records for each document in each collection. Only when this initial sync completes successfully will the replication then use the replica set's primary node to read the oplog and produce source records for each oplog event. The replication process records the position of each oplog event as an offset, so that upon restart the replication process can use the last recorded offset to determine where in the oplog it is to begin reading and processing events.

Use of Topics

The connector will write to a separate topic all of the source records that correspond to a single collection. The topic will be named "<logicalName>.<databaseName>.<collectionName>", where <logicalName> is set via the "topic.prefix" configuration property.

Configuration

This connector is configured with the set of properties described in MongoDbConnectorConfig.

Author:
Randall Hauch
  • Field Details

  • Constructor Details

    • MongoDbConnector

      public MongoDbConnector()
  • Method Details

    • version

      public String version()
    • taskClass

      public Class<? extends org.apache.kafka.connect.connector.Task> taskClass()
      Specified by:
      taskClass in class org.apache.kafka.connect.connector.Connector
    • start

      public void start(Map<String,String> props)
      Specified by:
      start in class org.apache.kafka.connect.connector.Connector
    • replicaSetsChanged

      protected void replicaSetsChanged(ReplicaSets replicaSets)
    • taskConfigs

      public List<Map<String,String>> taskConfigs(int maxTasks)
      Specified by:
      taskConfigs in class org.apache.kafka.connect.connector.Connector
    • stop

      public void stop()
      Specified by:
      stop in class org.apache.kafka.connect.connector.Connector
    • config

      public org.apache.kafka.common.config.ConfigDef config()
      Specified by:
      config in class org.apache.kafka.connect.connector.Connector
    • validate

      public org.apache.kafka.common.config.Config validate(Map<String,String> connectorConfigs)
      Overrides:
      validate in class org.apache.kafka.connect.connector.Connector