Package org.apache.pulsar.io.core
Interface BatchSource<T>
- All Superinterfaces:
AutoCloseable
- All Known Implementing Classes:
BatchPushSource
Interface for writing Batch sources
The lifecycle of the BatchSource is as follows
1. open - called once when connector is started. Can use method to perform
certain one-time operations such as init/setup operations. This is called on all
instances of the source and is analogous to the open method of the streaming Source api.
2. discover (called only on one instance (Currently instance zero(0), but might change later))
- The discovery phase will be executed on one instance of the connector.
- discover is triggered by the BatchSourceTriggerer class configured for this source.
- As and when discover discovers new tasks, it will emit them using the taskEater method.
- The framework will distribute the discovered tasks among all instances
3. prepare - is called on an instance when there is a new discovered task assigned for that instance
- The framework decides which discovered task is routed to which source instance. The connector
does not currently have a way to influence this.
- prepare is only called when the instance has fetched all records using readNext for its previously
assigned discovered task.
4. readNext is called repeatedly by the framework to fetch the next record. If there are no
more records available to emit, the connector should return null. That indicates
that all records for that particular discovered task is complete.
5. close is called when the source is stopped/deleted. This is analogous to the streaming Source api.
-
Method Summary
Methods inherited from interface java.lang.AutoCloseable
close
-
Method Details
-
open
Open connector with configuration.- Parameters:
config- config that's supplied for sourcecontext- environment where the source connector is running- Throws:
Exception- IO type exceptions when opening a connector
-
discover
Discovery phase of a connector. This phase will only be run on one instance, i.e. instance 0, of the connector. Implementations use the taskEater consumer to output serialized representation of tasks as they are discovered.- Parameters:
taskEater- function to notify the framework about the new task received.- Throws:
Exception- during discover
-
prepare
Called when a new task appears for this connector instance.- Parameters:
task- the serialized representation of the task- Throws:
Exception
-
readNext
Read data and return a record. Return null if no more records are present for this task- Returns:
- a record
- Throws:
Exception
-