Class SourceInfo


@NotThreadSafe public final class SourceInfo extends BaseSourceInfo
Information about the source of information, which includes the position in the source binary log we have previously processed.

The source partition information describes the database whose log is being consumed. Typically, the database is identified by the host address port number of the MySQL server and the name of the database. Here's a JSON-like representation of an example database:

 {
     "server" : "production-server"
 }
 

The source offset information is included in each event and captures where the connector should restart if this event's offset is the last one recorded. The offset includes the binlog filename, the position of the first event in the binlog, the number of events to skip, and the number of rows to also skip.

Here's a JSON-like representation of an example offset:

 {
     "server_id": 112233,
     "ts_sec": 1465937,
     "gtid": "db58b0ae-2c10-11e6-b284-0242ac110002:199",
     "file": "mysql-bin.000003",
     "pos" = 990,
     "event" = 0,
     "row": 0,
     "snapshot": true
 }
 

The "gtids" field only appears in offsets produced when GTIDs are enabled. The "snapshot" field only appears in offsets produced when the connector is in the middle of a snapshot. And finally, the "ts" field contains the seconds since Unix epoch (since Jan 1, 1970) of the MySQL event; the message envelopes also have a timestamp, but that timestamp is the milliseconds since since Jan 1, 1970.

Each change event envelope also includes the source struct that contains MySQL information about that particular event, including a mixture the fields from the partition (which is renamed in the source to make more sense), the binlog filename and position where the event can be found, and when GTIDs are enabled the GTID of the transaction in which the event occurs. Like with the offset, the "snapshot" field only appears for events produced when the connector is in the middle of a snapshot. Note that this information is likely different than the offset information, since the connector may need to restart from either just after the most recently completed transaction or the beginning of the most recently started transaction (whichever appears later in the binlog).

Here's a JSON-like representation of the source for the metadata for an event that corresponds to the above partition and offset:

 {
     "name": "production-server",
     "server_id": 112233,
     "ts_sec": 1465937,
     "gtid": "db58b0ae-2c10-11e6-b284-0242ac110002:199",
     "file": "mysql-bin.000003",
     "pos" = 1081,
     "row": 0,
     "snapshot": true,
     "thread" : 1,
     "db" : "inventory",
     "table" : "products"
 }
 
Author:
Randall Hauch
  • Field Details

    • SERVER_ID_KEY

      public static final String SERVER_ID_KEY
      See Also:
    • SERVER_PARTITION_KEY

      public static final String SERVER_PARTITION_KEY
      See Also:
    • GTID_SET_KEY

      public static final String GTID_SET_KEY
      See Also:
    • GTID_KEY

      public static final String GTID_KEY
      See Also:
    • EVENTS_TO_SKIP_OFFSET_KEY

      public static final String EVENTS_TO_SKIP_OFFSET_KEY
      See Also:
    • BINLOG_FILENAME_OFFSET_KEY

      public static final String BINLOG_FILENAME_OFFSET_KEY
      See Also:
    • BINLOG_POSITION_OFFSET_KEY

      public static final String BINLOG_POSITION_OFFSET_KEY
      See Also:
    • BINLOG_ROW_IN_EVENT_OFFSET_KEY

      public static final String BINLOG_ROW_IN_EVENT_OFFSET_KEY
      See Also:
    • TIMESTAMP_KEY

      public static final String TIMESTAMP_KEY
      See Also:
    • THREAD_KEY

      public static final String THREAD_KEY
      See Also:
    • QUERY_KEY

      public static final String QUERY_KEY
      See Also:
    • DATABASE_WHITELIST_KEY

      public static final String DATABASE_WHITELIST_KEY
      See Also:
    • DATABASE_INCLUDE_LIST_KEY

      public static final String DATABASE_INCLUDE_LIST_KEY
      See Also:
    • DATABASE_BLACKLIST_KEY

      public static final String DATABASE_BLACKLIST_KEY
      See Also:
    • DATABASE_EXCLUDE_LIST_KEY

      public static final String DATABASE_EXCLUDE_LIST_KEY
      See Also:
    • TABLE_WHITELIST_KEY

      public static final String TABLE_WHITELIST_KEY
      See Also:
    • TABLE_INCLUDE_LIST_KEY

      public static final String TABLE_INCLUDE_LIST_KEY
      See Also:
    • TABLE_BLACKLIST_KEY

      public static final String TABLE_BLACKLIST_KEY
      See Also:
    • TABLE_EXCLUDE_LIST_KEY

      public static final String TABLE_EXCLUDE_LIST_KEY
      See Also:
    • RESTART_PREFIX

      public static final String RESTART_PREFIX
      See Also:
    • currentGtidSet

      private String currentGtidSet
    • currentGtid

      private String currentGtid
    • currentBinlogFilename

      private String currentBinlogFilename
    • currentBinlogPosition

      private long currentBinlogPosition
    • currentRowNumber

      private int currentRowNumber
    • currentEventLengthInBytes

      private long currentEventLengthInBytes
    • restartGtidSet

      private String restartGtidSet
    • restartBinlogFilename

      private String restartBinlogFilename
    • restartBinlogPosition

      private long restartBinlogPosition
    • restartEventsToSkip

      private long restartEventsToSkip
    • restartRowsToSkip

      private int restartRowsToSkip
    • inTransaction

      private boolean inTransaction
    • serverId

      private long serverId
    • sourceTime

      private Instant sourceTime
    • threadId

      private long threadId
    • sourcePartition

      private final Map<String,String> sourcePartition
    • lastSnapshot

      private boolean lastSnapshot
    • nextSnapshot

      private boolean nextSnapshot
    • currentQuery

      private String currentQuery
    • databaseIncludeList

      private String databaseIncludeList
    • databaseExcludeList

      private String databaseExcludeList
    • tableIncludeList

      private String tableIncludeList
    • tableExcludeList

      private String tableExcludeList
    • tableIds

      private Set<TableId> tableIds
    • databaseName

      private String databaseName
  • Constructor Details

  • Method Details

    • setQuery

      public void setQuery(String query)
      Set the original SQL query.
      Parameters:
      query - the original SQL query that generated the event.
    • getQuery

      public String getQuery()
      Returns:
      the original SQL query that generated the event. NULL if no such query is associated.
    • partition

      public Map<String,String> partition()
      Get the Kafka Connect detail about the source "partition", which describes the portion of the source that we are consuming. Since we're reading the binary log for a single database, the source partition specifies the database server.

      The resulting map is mutable for efficiency reasons (this information rarely changes), but should not be mutated.

      Returns:
      the source partition information; never null
    • setBinlogStartPoint

      public void setBinlogStartPoint(String binlogFilename, long positionOfFirstEvent)
      Set the position in the MySQL binlog where we will start reading.
      Parameters:
      binlogFilename - the name of the binary log file; may not be null
      positionOfFirstEvent - the position in the binary log file to begin processing
    • setEventPosition

      public void setEventPosition(long positionOfCurrentEvent, long eventSizeInBytes)
      Set the position within the MySQL binary log file of the current event.
      Parameters:
      positionOfCurrentEvent - the position within the binary log file of the current event
      eventSizeInBytes - the size in bytes of this event
    • offset

      public Map<String,?> offset()
      Get the Kafka Connect detail about the source "offset", which describes the position within the source where we last have last read.
      Returns:
      a copy of the current offset; never null
    • offsetForRow

      public Map<String,Object> offsetForRow(int eventRowNumber, int totalNumberOfRows)
      Given the row number within a binlog event and the total number of rows in that event, compute and return the Kafka Connect offset that is be included in the produced change event describing the row.

      This method should always be called before AbstractSourceInfo.struct().

      Parameters:
      eventRowNumber - the 0-based row number within the event for which the offset is to be produced
      totalNumberOfRows - the total number of rows within the event being processed
      Returns:
      a copy of the current offset; never null
      See Also:
    • changeEventCompleted

      public void changeEventCompleted()
    • offsetUsingPosition

      private Map<String,Object> offsetUsingPosition(long rowsToSkip)
    • databaseEvent

      public void databaseEvent(String databaseName)
    • tableEvent

      public void tableEvent(Set<TableId> tableIds)
    • tableEvent

      public void tableEvent(TableId tableId)
    • isSnapshotInEffect

      public boolean isSnapshotInEffect()
      Determine whether a snapshot is currently in effect.
      Returns:
      true if a snapshot is in effect, or false otherwise
    • startNextTransaction

      public void startNextTransaction()
    • completeEvent

      public void completeEvent()
      Capture that we're starting a new event.
    • eventsToSkipUponRestart

      public long eventsToSkipUponRestart()
      Get the number of events after the last transaction BEGIN that we've already processed.
      Returns:
      the number of events in the transaction that have been processed completely
      See Also:
    • commitTransaction

      public void commitTransaction()
    • startGtid

      public void startGtid(String gtid, String gtidSet)
      Record that a new GTID transaction has been started and has been included in the set of GTIDs known to the MySQL server.
      Parameters:
      gtid - the string representation of a specific GTID that has been begun; may not be null
      gtidSet - the string representation of GTID set that includes the newly begun GTID; may not be null
    • setCompletedGtidSet

      public void setCompletedGtidSet(String gtidSet)
      Set the GTID set that captures all of the GTID transactions that have been completely processed.
      Parameters:
      gtidSet - the string representation of the GTID set; may not be null, but may be an empty string if no GTIDs have been previously processed
    • setBinlogServerId

      public void setBinlogServerId(long serverId)
      Set the server ID as found within the MySQL binary log file.
      Parameters:
      serverId - the server ID found within the binary log file
    • setBinlogTimestampSeconds

      public void setBinlogTimestampSeconds(long timestampInSeconds)
      Set the number of seconds since Unix epoch (January 1, 1970) as found within the MySQL binary log file. Note that the value in the binlog events is in seconds, but the library we use returns the value in milliseconds (with only second precision and therefore all fractions of a second are zero). We capture this as seconds since that is the precision that MySQL uses.
      Parameters:
      timestampInSeconds - the timestamp in seconds found within the binary log file
    • setSourceTime

      public void setSourceTime(Instant timestamp)
    • setBinlogThread

      public void setBinlogThread(long threadId)
      Set the identifier of the MySQL thread that generated the most recent event.
      Parameters:
      threadId - the thread identifier; may be negative if not known
    • startSnapshot

      public void startSnapshot()
      Denote that a snapshot is being (or has been) started.
    • markLastSnapshot

      public void markLastSnapshot(Configuration config)
      Denote that a snapshot will be complete after one last record.
    • completeSnapshot

      public void completeSnapshot()
      Denote that a snapshot has completed successfully.
    • setFilterDataFromConfig

      public void setFilterDataFromConfig(Configuration config)
      Set the filter data for the offset from the given Configuration
      Parameters:
      config - the configuration
    • maybeSetFilterDataFromConfig

      public void maybeSetFilterDataFromConfig(Configuration config)
      Set filter data from config if and only if parallel snapshotting of new tables is turned on
      Parameters:
      config - the configuration.
    • hasFilterInfo

      public boolean hasFilterInfo()
      Returns:
      true if this offset has filter info, false otherwise.
    • getDatabaseIncludeList

      public String getDatabaseIncludeList()
    • getDatabaseExcludeList

      public String getDatabaseExcludeList()
    • getTableIncludeList

      public String getTableIncludeList()
    • getTableExcludeList

      public String getTableExcludeList()
    • setOffset

      public void setOffset(Map<String,?> sourceOffset)
      Set the source offset, as read from Kafka Connect. This method does nothing if the supplied map is null.
      Parameters:
      sourceOffset - the previously-recorded Kafka Connect source offset
      Throws:
      org.apache.kafka.connect.errors.ConnectException - if any offset parameter values are missing, invalid, or of the wrong type
    • offsetsHaveFilterInfo

      public static boolean offsetsHaveFilterInfo(Map<String,?> sourceOffset)
    • longOffsetValue

      private long longOffsetValue(Map<String,?> values, String key)
    • booleanOffsetValue

      private boolean booleanOffsetValue(Map<String,?> values, String key)
    • gtidSet

      public String gtidSet()
      Get the string representation of the GTID range for the MySQL binary log file.
      Returns:
      the string representation of the binlog GTID ranges; may be null
    • binlogFilename

      public String binlogFilename()
      Get the name of the MySQL binary log file that has last been processed.
      Returns:
      the name of the binary log file; null if it has not been set
    • binlogPosition

      public long binlogPosition()
      Get the position within the MySQL binary log file of the next event to be processed.
      Returns:
      the position within the binary log file; null if it has not been set
    • restartBinlogPosition

      protected long restartBinlogPosition()
      Get the position within the MySQL binary log file of the most recently processed event.
      Returns:
      the position within the binary log file; null if it has not been set
    • rowsToSkipUponRestart

      public int rowsToSkipUponRestart()
      Get the number of rows beyond the last completely processed event to be skipped upon restart.
      Returns:
      the number of rows to be skipped
    • getServerId

      long getServerId()
    • getThreadId

      long getThreadId()
    • table

      String table()
      Returns a string representation of the table(s) affected by the current event. Will only represent more than a single table for events in the user-facing schema history topic for certain types of DDL. Will be null for DDL events not applying to tables (CREATE DATABASE etc.).
    • getCurrentGtid

      String getCurrentGtid()
    • isLastSnapshot

      boolean isLastSnapshot()
    • getCurrentBinlogFilename

      String getCurrentBinlogFilename()
    • getCurrentBinlogPosition

      long getCurrentBinlogPosition()
    • getBinlogTimestampSeconds

      long getBinlogTimestampSeconds()
    • getCurrentRowNumber

      int getCurrentRowNumber()
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • createDocumentFromOffset

      public static Document createDocumentFromOffset(Map<String,?> offset)
      Create a Document from the given offset.
      Parameters:
      offset - the offset to create the document from.
      Returns:
      a Document with the offset data.
    • isPositionAtOrBefore

      public static boolean isPositionAtOrBefore(Document recorded, Document desired, Predicate<String> gtidFilter)
      Determine whether the first offset is at or before the point in time of the second offset, where the offsets are given in JSON representation of the maps returned by offset().

      This logic makes a significant assumption: once a MySQL server/cluster has GTIDs enabled, they will never be disabled. This is the only way to compare a position with a GTID to a position without a GTID, and we conclude that any position with a GTID is *after* the position without.

      When both positions have GTIDs, then we compare the positions by using only the GTIDs. Of course, if the GTIDs are the same, then we also look at whether they have snapshots enabled.

      Parameters:
      recorded - the position obtained from recorded history; never null
      desired - the desired position that we want to obtain, which should be after some recorded positions, at some recorded positions, and before other recorded positions; never null
      gtidFilter - the predicate function that will return true if a GTID source is to be included, or false if a GTID source is to be excluded; may be null if no filtering is to be done
      Returns:
      true if the recorded position is at or before the desired position; or false otherwise
    • timestamp

      protected Instant timestamp()
      Specified by:
      timestamp in class AbstractSourceInfo
    • snapshot

      protected SnapshotRecord snapshot()
      Overrides:
      snapshot in class BaseSourceInfo
    • database

      protected String database()
      Specified by:
      database in class AbstractSourceInfo