StreamExecutionEnvironment

org.apache.flinkx.api.StreamExecutionEnvironment
See theStreamExecutionEnvironment companion object
@Public
class StreamExecutionEnvironment(javaEnv: StreamExecutionEnvironment)

Attributes

Companion
object
Graph
Supertypes
class Object
trait Matchable
class Any

Members list

Value members

Concrete methods

def addDefaultKryoSerializer[T <: Serializer[_] & Serializable](`type`: Class[_], serializer: T): Unit

Adds a new Kryo default serializer to the Runtime.

Adds a new Kryo default serializer to the Runtime.

Note that the serializer instance must be serializable (as defined by java.io.Serializable), because it may be distributed to the worker nodes by java serialization.

Value parameters

serializer

The serializer to use.

type

The class of the types serialized with the given serializer.

Attributes

def addDefaultKryoSerializer(`type`: Class[_], serializerClass: Class[_ <: Serializer[_]]): Unit

Adds a new Kryo default serializer to the Runtime.

Adds a new Kryo default serializer to the Runtime.

Value parameters

serializerClass

The class of the serializer to use.

type

The class of the types serialized with the given serializer.

Attributes

def addSource[T : TypeInformation](function: SourceFunction[T]): DataStream[T]

Create a DataStream using a user defined source function for arbitrary source functionality. By default sources have a parallelism of 1. To enable parallel execution, the user defined source should implement ParallelSourceFunction or extend RichParallelSourceFunction. In these cases the resulting source will have the parallelism of the environment. To change this afterwards call DataStreamSource.setParallelism(int)

Create a DataStream using a user defined source function for arbitrary source functionality. By default sources have a parallelism of 1. To enable parallel execution, the user defined source should implement ParallelSourceFunction or extend RichParallelSourceFunction. In these cases the resulting source will have the parallelism of the environment. To change this afterwards call DataStreamSource.setParallelism(int)

Attributes

def addSource[T : TypeInformation](function: SourceContext[T] => Unit): DataStream[T]

Create a DataStream using a user defined source function for arbitrary source functionality.

Create a DataStream using a user defined source function for arbitrary source functionality.

Attributes

@PublicEvolving
def clearJobListeners(): Unit

Clear all registered JobListeners.

Clear all registered JobListeners.

Attributes

@PublicEvolving
def configure(configuration: ReadableConfig, classLoader: ClassLoader): Unit

Sets all relevant options contained in the ReadableConfig such as e.g. org.apache.flink.streaming.api.environment.StreamPipelineOptions#TIME_CHARACTERISTIC. It will reconfigure StreamExecutionEnvironment, org.apache.flink.api.common.ExecutionConfig and org.apache.flink.streaming.api.environment.CheckpointConfig.

Sets all relevant options contained in the ReadableConfig such as e.g. org.apache.flink.streaming.api.environment.StreamPipelineOptions#TIME_CHARACTERISTIC. It will reconfigure StreamExecutionEnvironment, org.apache.flink.api.common.ExecutionConfig and org.apache.flink.streaming.api.environment.CheckpointConfig.

It will change the value of a setting only if a corresponding option was set in the configuration. If a key is not present, the current value of a field will remain untouched.

Value parameters

classLoader

a class loader to use when loading classes

configuration

a configuration to read the values from

Attributes

@PublicEvolving
def configure(configuration: ReadableConfig): Unit

Sets all relevant options contained in the ReadableConfig such as e.g. org.apache.flink.streaming.api.environment.StreamPipelineOptions#TIME_CHARACTERISTIC. It will reconfigure StreamExecutionEnvironment, org.apache.flink.api.common.ExecutionConfig and org.apache.flink.streaming.api.environment.CheckpointConfig.

Sets all relevant options contained in the ReadableConfig such as e.g. org.apache.flink.streaming.api.environment.StreamPipelineOptions#TIME_CHARACTERISTIC. It will reconfigure StreamExecutionEnvironment, org.apache.flink.api.common.ExecutionConfig and org.apache.flink.streaming.api.environment.CheckpointConfig.

It will change the value of a setting only if a corresponding option was set in the configuration. If a key is not present, the current value of a field will remain untouched.

Value parameters

configuration

a configuration to read the values from

Attributes

@PublicEvolving
def createInput[T : TypeInformation](inputFormat: InputFormat[T, _]): DataStream[T]

Generic method to create an input data stream with a specific input format. Since all data streams need specific information about their types, this method needs to determine the type of the data produced by the input format. It will attempt to determine the data type by reflection, unless the input format implements the ResultTypeQueryable interface.

Generic method to create an input data stream with a specific input format. Since all data streams need specific information about their types, this method needs to determine the type of the data produced by the input format. It will attempt to determine the data type by reflection, unless the input format implements the ResultTypeQueryable interface.

Attributes

Disables operator chaining for streaming operators. Operator chaining allows non-shuffle operations to be co-located in the same thread fully avoiding serialization and de-serialization.

Disables operator chaining for streaming operators. Operator chaining allows non-shuffle operations to be co-located in the same thread fully avoiding serialization and de-serialization.

Attributes

@PublicEvolving

Enable the change log for current state backend. This change log allows operators to persist state changes in a very fine-grained manner. Currently, the change log only applies to keyed state, so non-keyed operator state and channel state are persisted as usual. The 'state' here refers to 'keyed state'. Details are as follows:

Enable the change log for current state backend. This change log allows operators to persist state changes in a very fine-grained manner. Currently, the change log only applies to keyed state, so non-keyed operator state and channel state are persisted as usual. The 'state' here refers to 'keyed state'. Details are as follows:

Stateful operators write the state changes to that log (logging the state), in addition to applying them to the state tables in RocksDB or the in-mem Hashtable.

An operator can acknowledge a checkpoint as soon as the changes in the log have reached the durable checkpoint storage.

The state tables are persisted periodically, independent of the checkpoints. We call this the materialization of the state on the checkpoint storage.

Once the state is materialized on checkpoint storage, the state changelog can be truncated to the corresponding point.

It establish a way to drastically reduce the checkpoint interval for streaming applications across state backends. For more details please check the FLIP-158.

If this method is not called explicitly, it means no preference for enabling the change log. Configs for change log enabling will override in different config levels (job/local/cluster).

Value parameters

enabled

true if enable the change log for state backend explicitly, otherwise disable the change log.

Attributes

Returns

This StreamExecutionEnvironment itself, to allow chaining of function calls.

See also

#isChangelogStateBackendEnabled()

def enableCheckpointing(interval: Long, mode: CheckpointingMode): StreamExecutionEnvironment

Enables checkpointing for the streaming job. The distributed state of the streaming dataflow will be periodically snapshotted. In case of a failure, the streaming dataflow will be restarted from the latest completed checkpoint.

Enables checkpointing for the streaming job. The distributed state of the streaming dataflow will be periodically snapshotted. In case of a failure, the streaming dataflow will be restarted from the latest completed checkpoint.

The job draws checkpoints periodically, in the given interval. The system uses the given CheckpointingMode for the checkpointing ("exactly once" vs "at least once"). The state will be stored in the configured state backend.

NOTE: Checkpointing iterative streaming dataflows in not properly supported at the moment. For that reason, iterative jobs will not be started if used with enabled checkpointing. To override this mechanism, use the CheckpointingMode, boolean) method.

Value parameters

interval

Time interval between state checkpoints in milliseconds.

mode

The checkpointing mode, selecting between "exactly once" and "at least once" guarantees.

Attributes

Enables checkpointing for the streaming job. The distributed state of the streaming dataflow will be periodically snapshotted. In case of a failure, the streaming dataflow will be restarted from the latest completed checkpoint.

Enables checkpointing for the streaming job. The distributed state of the streaming dataflow will be periodically snapshotted. In case of a failure, the streaming dataflow will be restarted from the latest completed checkpoint.

The job draws checkpoints periodically, in the given interval. The program will use CheckpointingMode.EXACTLY_ONCE mode. The state will be stored in the configured state backend.

NOTE: Checkpointing iterative streaming dataflows in not properly supported at the moment. For that reason, iterative jobs will not be started if used with enabled checkpointing. To override this mechanism, use the CheckpointingMode, boolean) method.

Value parameters

interval

Time interval between state checkpoints in milliseconds.

Attributes

def execute(): JobExecutionResult

Triggers the program execution. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results or forwarding them to a message queue.

Triggers the program execution. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results or forwarding them to a message queue.

The program execution will be logged and displayed with a generated default name.

Attributes

Returns

The result of the job execution, containing elapsed time and accumulators.

def execute(jobName: String): JobExecutionResult

Triggers the program execution. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results or forwarding them to a message queue.

Triggers the program execution. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results or forwarding them to a message queue.

The program execution will be logged and displayed with the provided name.

Attributes

Returns

The result of the job execution, containing elapsed time and accumulators.

@PublicEvolving
def executeAsync(): JobClient

Triggers the program execution asynchronously. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results or forwarding them to a message queue.

Triggers the program execution asynchronously. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results or forwarding them to a message queue.

The program execution will be logged and displayed with a generated default name.

ATTENTION: The caller of this method is responsible for managing the lifecycle of the returned JobClient. This means calling JobClient#close() at the end of its usage. In other case, there may be resource leaks depending on the JobClient implementation.

Attributes

Returns

A JobClient that can be used to communicate with the submitted job, completed on submission succeeded.

@PublicEvolving
def executeAsync(jobName: String): JobClient

Triggers the program execution asynchronously. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results or forwarding them to a message queue.

Triggers the program execution asynchronously. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results or forwarding them to a message queue.

The program execution will be logged and displayed with the provided name.

ATTENTION: The caller of this method is responsible for managing the lifecycle of the returned JobClient. This means calling JobClient#close() at the end of its usage. In other case, there may be resource leaks depending on the JobClient implementation.

Attributes

Returns

A JobClient that can be used to communicate with the submitted job, completed on submission succeeded.

def fromCollection[T : TypeInformation](data: Seq[T]): DataStream[T]

Creates a DataStream from the given non-empty Seq. The elements need to be serializable because the framework may move the elements into the cluster if needed.

Creates a DataStream from the given non-empty Seq. The elements need to be serializable because the framework may move the elements into the cluster if needed.

Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.

Attributes

def fromCollection[T : TypeInformation](data: Iterator[T]): DataStream[T]

Creates a DataStream from the given Iterator.

Creates a DataStream from the given Iterator.

Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.

Attributes

def fromElements[T : TypeInformation](data: T*): DataStream[T]

Creates a DataStream that contains the given elements. The elements must all be of the same type.

Creates a DataStream that contains the given elements. The elements must all be of the same type.

Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.

Attributes

def fromParallelCollection[T : TypeInformation](data: SplittableIterator[T]): DataStream[T]

Creates a DataStream from the given SplittableIterator.

Creates a DataStream from the given SplittableIterator.

Attributes

def fromSequence(from: Long, to: Long): DataStream[Long]

Creates a new data stream that contains a sequence of numbers (longs) and is useful for testing and for cases that just need a stream of N events of any kind.

Creates a new data stream that contains a sequence of numbers (longs) and is useful for testing and for cases that just need a stream of N events of any kind.

The generated source splits the sequence into as many parallel sub-sequences as there are parallel source readers. Each sub-sequence will be produced in order. If the parallelism is limited to one, the source will produce one sequence in order.

This source is always bounded. For very long sequences (for example over the entire domain of long integer values), you may consider executing the application in a streaming manner because of the end bound that is pretty far away.

Use String) together with NumberSequenceSource if you required more control over the created sources. For example, if you want to set a WatermarkStrategy.

Attributes

@Experimental
def fromSource[T : TypeInformation](source: Source[T, _ <: SourceSplit, _], watermarkStrategy: WatermarkStrategy[T], sourceName: String): DataStream[T]

Create a DataStream using a Source.

Create a DataStream using a Source.

Attributes

def getBufferTimeout: Long

Gets the default buffer timeout set for this environment

Gets the default buffer timeout set for this environment

Attributes

def getCachedFiles: List[Tuple2[String, DistributedCacheEntry]]

Gets cache files.

Gets cache files.

Attributes

def getCheckpointConfig: CheckpointConfig

Gets the checkpoint config, which defines values like checkpoint interval, delay between checkpoints, etc.

Gets the checkpoint config, which defines values like checkpoint interval, delay between checkpoints, etc.

Attributes

def getCheckpointingMode: CheckpointingMode
def getConfig: ExecutionConfig

Gets the config object.

Gets the config object.

Attributes

@Internal
def getConfiguration: ReadableConfig

Gives read-only access to the underlying configuration of this environment.

Gives read-only access to the underlying configuration of this environment.

Note that the returned configuration might not be complete. It only contains options that have initialized the environment or options that are not represented in dedicated configuration classes such as ExecutionConfig or CheckpointConfig.

Use configure to set options that are specific to this environment.

Attributes

@PublicEvolving

Gets the default savepoint directory for this Job.

Gets the default savepoint directory for this Job.

Attributes

See also

#setDefaultSavepointDirectory(Path)

def getExecutionPlan: String

Creates the plan with which the system will execute the program, and returns it as a String using a JSON representation of the execution data flow graph. Note that this needs to be called, before the plan is executed.

Creates the plan with which the system will execute the program, and returns it as a String using a JSON representation of the execution data flow graph. Note that this needs to be called, before the plan is executed.

Attributes

def getJavaEnv: StreamExecutionEnvironment

Attributes

Returns

the wrapped Java environment

@PublicEvolving
def getJobListeners: List[JobListener]

Gets the config JobListeners.

Gets the config JobListeners.

Attributes

Returns the maximum degree of parallelism defined for the program.

Returns the maximum degree of parallelism defined for the program.

The maximum degree of parallelism specifies the upper limit for dynamic scaling. It also defines the number of key groups used for partitioned state.

Attributes

@PublicEvolving

Gets the number of times the system will try to re-execute failed tasks. A value of "-1" indicates that the system default value (as defined in the configuration) should be used.

Gets the number of times the system will try to re-execute failed tasks. A value of "-1" indicates that the system default value (as defined in the configuration) should be used.

Attributes

def getParallelism: Int

Returns the default parallelism for this execution environment. Note that this value can be overridden by individual operations using DataStream#setParallelism

Returns the default parallelism for this execution environment. Note that this value can be overridden by individual operations using DataStream#setParallelism

Attributes

@PublicEvolving
def getRestartStrategy: RestartStrategyConfiguration

Returns the specified restart strategy configuration.

Returns the specified restart strategy configuration.

Attributes

Returns

The restart strategy configuration to be used

@PublicEvolving
def getStateBackend: StateBackend

Returns the state backend that defines how to store and checkpoint state.

Returns the state backend that defines how to store and checkpoint state.

Attributes

@Internal
def getStreamGraph: StreamGraph

Getter of the org.apache.flink.streaming.api.graph.StreamGraph of the streaming job. This call clears previously registered transformations.

Getter of the org.apache.flink.streaming.api.graph.StreamGraph of the streaming job. This call clears previously registered transformations.

Attributes

Returns

The StreamGraph representing the transformations

@Internal
def getStreamGraph(clearTransformations: Boolean): StreamGraph

Getter of the org.apache.flink.streaming.api.graph.StreamGraph of the streaming job with the option to clear previously registered transformations. Clearing the transformations allows, for example, to not re-execute the same operations when calling execute multiple times.

Getter of the org.apache.flink.streaming.api.graph.StreamGraph of the streaming job with the option to clear previously registered transformations. Clearing the transformations allows, for example, to not re-execute the same operations when calling execute multiple times.

Value parameters

clearTransformations

Whether or not to clear previously registered transformations

Attributes

Returns

The StreamGraph representing the transformations

@PublicEvolving
def getStreamTimeCharacteristic: TimeCharacteristic

Gets the time characteristic/

Gets the time characteristic/

Attributes

Returns

The time characteristic.

See also

#setStreamTimeCharacteristic

@Internal
def getWrappedStreamExecutionEnvironment: StreamExecutionEnvironment

Getter of the wrapped org.apache.flink.streaming.api.environment.StreamExecutionEnvironment

Getter of the wrapped org.apache.flink.streaming.api.environment.StreamExecutionEnvironment

Attributes

Returns

The encased ExecutionEnvironment

@PublicEvolving
def isChangelogStateBackendEnabled: TernaryBoolean

Gets the enable status of change log for state backend.

Gets the enable status of change log for state backend.

Attributes

Returns

a TernaryBoolean for the enable status of change log for state backend. Could be TernaryBoolean#UNDEFINED if user never specify this by calling enableChangelogStateBackend.

Returns whether Unaligned Checkpoints are force-enabled.

Returns whether Unaligned Checkpoints are force-enabled.

Attributes

Returns whether Unaligned Checkpoints are enabled.

Returns whether Unaligned Checkpoints are enabled.

Attributes

def readFile[T : TypeInformation](inputFormat: FileInputFormat[T], filePath: String): DataStream[T]

Reads the given file with the given input format. The file path should be passed as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").

Reads the given file with the given input format. The file path should be passed as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").

Attributes

@PublicEvolving
def readFile[T : TypeInformation](inputFormat: FileInputFormat[T], filePath: String, watchType: FileProcessingMode, interval: Long): DataStream[T]

Reads the contents of the user-specified path based on the given FileInputFormat. Depending on the provided FileProcessingMode, the source may periodically monitor (every interval ms) the path for new data (FileProcessingMode.PROCESS_CONTINUOUSLY), or process once the data currently in the path and exit (FileProcessingMode.PROCESS_ONCE). In addition, if the path contains files not to be processed, the user can specify a custom FilePathFilter. As a default implementation you can use FilePathFilter.createDefaultFilter().

Reads the contents of the user-specified path based on the given FileInputFormat. Depending on the provided FileProcessingMode, the source may periodically monitor (every interval ms) the path for new data (FileProcessingMode.PROCESS_CONTINUOUSLY), or process once the data currently in the path and exit (FileProcessingMode.PROCESS_ONCE). In addition, if the path contains files not to be processed, the user can specify a custom FilePathFilter. As a default implementation you can use FilePathFilter.createDefaultFilter().

** NOTES ON CHECKPOINTING: ** If the watchType is set to FileProcessingMode#PROCESS_ONCE, the source monitors the path ** once **, creates the FileInputSplits to be processed, forwards them to the downstream readers to read the actual data, and exits, without waiting for the readers to finish reading. This implies that no more checkpoint barriers are going to be forwarded after the source exits, thus having no checkpoints after that point.

Value parameters

filePath

The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path")

inputFormat

The input format used to create the data stream

interval

In the case of periodic path monitoring, this specifies the interval (in millis) between consecutive path scans

watchType

The mode in which the source should operate, i.e. monitor path and react to new data, or process once and exit

Attributes

Returns

The data stream that represents the data read from the given file

def readTextFile(filePath: String): DataStream[String]

Creates a DataStream that represents the Strings produced by reading the given file line wise. The file will be read with the system's default character set.

Creates a DataStream that represents the Strings produced by reading the given file line wise. The file will be read with the system's default character set.

Attributes

def readTextFile(filePath: String, charsetName: String): DataStream[String]

Creates a data stream that represents the Strings produced by reading the given file line wise. The character set with the given name will be used to read the files.

Creates a data stream that represents the Strings produced by reading the given file line wise. The character set with the given name will be used to read the files.

Attributes

def registerCachedFile(filePath: String, name: String): Unit

Registers a file at the distributed cache under the given name. The file will be accessible from any user-defined function in the (distributed) runtime under a local path. Files may be local files (which will be distributed via BlobServer), or files in a distributed file system. The runtime will copy the files temporarily to a local cache, if needed.

Registers a file at the distributed cache under the given name. The file will be accessible from any user-defined function in the (distributed) runtime under a local path. Files may be local files (which will be distributed via BlobServer), or files in a distributed file system. The runtime will copy the files temporarily to a local cache, if needed.

The org.apache.flink.api.common.functions.RuntimeContext can be obtained inside UDFs via org.apache.flink.api.common.functions.RichFunction#getRuntimeContext and provides access org.apache.flink.api.common.cache.DistributedCache via org.apache.flink.api.common.functions.RuntimeContext#getDistributedCache.

Value parameters

filePath

The path of the file, as a URI (e.g. "file:///some/path" or "hdfs://host:port/and/path")

name

The name under which the file is registered.

Attributes

def registerCachedFile(filePath: String, name: String, executable: Boolean): Unit

Registers a file at the distributed cache under the given name. The file will be accessible from any user-defined function in the (distributed) runtime under a local path. Files may be local files (which will be distributed via BlobServer), or files in a distributed file system. The runtime will copy the files temporarily to a local cache, if needed.

Registers a file at the distributed cache under the given name. The file will be accessible from any user-defined function in the (distributed) runtime under a local path. Files may be local files (which will be distributed via BlobServer), or files in a distributed file system. The runtime will copy the files temporarily to a local cache, if needed.

The org.apache.flink.api.common.functions.RuntimeContext can be obtained inside UDFs via org.apache.flink.api.common.functions.RichFunction#getRuntimeContext and provides access org.apache.flink.api.common.cache.DistributedCache via org.apache.flink.api.common.functions.RuntimeContext#getDistributedCache.

Value parameters

executable

flag indicating whether the file should be executable

filePath

The path of the file, as a URI (e.g. "file:///some/path" or "hdfs://host:port/and/path")

name

The name under which the file is registered.

Attributes

@PublicEvolving
def registerJobListener(jobListener: JobListener): Unit

Register a JobListener in this environment. The JobListener will be notified on specific job status changed.

Register a JobListener in this environment. The JobListener will be notified on specific job status changed.

Attributes

@PublicEvolving
def registerSlotSharingGroup(slotSharingGroup: SlotSharingGroup): StreamExecutionEnvironment

Register a slot sharing group with its resource spec.

Register a slot sharing group with its resource spec.

Note that a slot sharing group hints the scheduler that the grouped operators CAN be deployed into a shared slot. There's no guarantee that the scheduler always deploy the grouped operators together. In cases grouped operators are deployed into separate slots, the slot resources will be derived from the specified group requirements.

Value parameters

slotSharingGroup

which contains name and its resource spec.

Attributes

def registerType(typeClass: Class[_]): Unit

Registers the given type with the serialization stack. If the type is eventually serialized as a POJO, then the type is registered with the POJO serializer. If the type ends up being serialized with Kryo, then it will be registered at Kryo to make sure that only tags are written.

Registers the given type with the serialization stack. If the type is eventually serialized as a POJO, then the type is registered with the POJO serializer. If the type ends up being serialized with Kryo, then it will be registered at Kryo to make sure that only tags are written.

Attributes

def registerTypeWithKryoSerializer[T <: Serializer[_] & Serializable](clazz: Class[_], serializer: T): Unit

Registers the given type with the serializer at the KryoSerializer.

Registers the given type with the serializer at the KryoSerializer.

Note that the serializer instance must be serializable (as defined by java.io.Serializable), because it may be distributed to the worker nodes by java serialization.

Attributes

def registerTypeWithKryoSerializer(clazz: Class[_], serializer: Class[_ <: Serializer[_]]): Unit

Registers the given type with the serializer at the KryoSerializer.

Registers the given type with the serializer at the KryoSerializer.

Attributes

def setBufferTimeout(timeoutMillis: Long): StreamExecutionEnvironment

Sets the maximum time frequency (milliseconds) for the flushing of the output buffers. By default the output buffers flush frequently to provide low latency and to aid smooth developer experience. Setting the parameter can result in three logical modes:

Sets the maximum time frequency (milliseconds) for the flushing of the output buffers. By default the output buffers flush frequently to provide low latency and to aid smooth developer experience. Setting the parameter can result in three logical modes:

  • A positive integer triggers flushing periodically by that integer
  • 0 triggers flushing after every record thus minimizing latency
  • -1 triggers flushing only when the output buffer is full thus maximizing throughput

Attributes

@PublicEvolving
def setDefaultSavepointDirectory(savepointDirectory: String): StreamExecutionEnvironment

Sets the default savepoint directory, where savepoints will be written to if no is explicitly provided when triggered.

Sets the default savepoint directory, where savepoints will be written to if no is explicitly provided when triggered.

Attributes

Returns

This StreamExecutionEnvironment itself, to allow chaining of function calls.

See also

#getDefaultSavepointDirectory()

@PublicEvolving

Sets the default savepoint directory, where savepoints will be written to if no is explicitly provided when triggered.

Sets the default savepoint directory, where savepoints will be written to if no is explicitly provided when triggered.

Attributes

Returns

This StreamExecutionEnvironment itself, to allow chaining of function calls.

See also

#getDefaultSavepointDirectory()

@PublicEvolving

Sets the default savepoint directory, where savepoints will be written to if no is explicitly provided when triggered.

Sets the default savepoint directory, where savepoints will be written to if no is explicitly provided when triggered.

Attributes

Returns

This StreamExecutionEnvironment itself, to allow chaining of function calls.

See also

#getDefaultSavepointDirectory()

def setMaxParallelism(maxParallelism: Int): Unit

Sets the maximum degree of parallelism defined for the program. The maximum degree of parallelism specifies the upper limit for dynamic scaling. It also defines the number of key groups used for partitioned state.

Sets the maximum degree of parallelism defined for the program. The maximum degree of parallelism specifies the upper limit for dynamic scaling. It also defines the number of key groups used for partitioned state.

Attributes

@PublicEvolving
def setNumberOfExecutionRetries(numRetries: Int): Unit

Sets the number of times that failed tasks are re-executed. A value of zero effectively disables fault tolerance. A value of "-1" indicates that the system default value (as defined in the configuration) should be used.

Sets the number of times that failed tasks are re-executed. A value of zero effectively disables fault tolerance. A value of "-1" indicates that the system default value (as defined in the configuration) should be used.

Attributes

def setParallelism(parallelism: Int): Unit

Sets the parallelism for operations executed through this environment. Setting a parallelism of x here will cause all operators (such as join, map, reduce) to run with x parallel instances. This value can be overridden by specific operations using DataStream#setParallelism.

Sets the parallelism for operations executed through this environment. Setting a parallelism of x here will cause all operators (such as join, map, reduce) to run with x parallel instances. This value can be overridden by specific operations using DataStream#setParallelism.

Attributes

@PublicEvolving
def setRestartStrategy(restartStrategyConfiguration: RestartStrategyConfiguration): Unit

Sets the restart strategy configuration. The configuration specifies which restart strategy will be used for the execution graph in case of a restart.

Sets the restart strategy configuration. The configuration specifies which restart strategy will be used for the execution graph in case of a restart.

Value parameters

restartStrategyConfiguration

Restart strategy configuration to be set

Attributes

@PublicEvolving
def setRuntimeMode(executionMode: RuntimeExecutionMode): StreamExecutionEnvironment

Sets the runtime execution mode for the application (see RuntimeExecutionMode). This is equivalent to setting the "execution.runtime-mode" in your application's configuration file.

Sets the runtime execution mode for the application (see RuntimeExecutionMode). This is equivalent to setting the "execution.runtime-mode" in your application's configuration file.

We recommend users to NOT use this method but set the "execution.runtime-mode" using the command-line when submitting the application. Keeping the application code configuration-free allows for more flexibility as the same application will be able to be executed in any execution mode.

Value parameters

executionMode

the desired execution mode.

Attributes

Returns

The execution environment of your application.

@PublicEvolving
def setStateBackend(backend: StateBackend): StreamExecutionEnvironment

Sets the state backend that describes how to store operator. It defines the data structures that hold state during execution (for example hash tables, RocksDB, or other data stores).

Sets the state backend that describes how to store operator. It defines the data structures that hold state during execution (for example hash tables, RocksDB, or other data stores).

State managed by the state backend includes both keyed state that is accessible on org.apache.flink.api.KeyedStream, as well as state maintained directly by the user code that implements org.apache.flink.streaming.api.checkpoint.CheckpointedFunction.

The org.apache.flink.runtime.state.hashmap.HashMapStateBackend maintains state in heap memory, as objects. It is lightweight without extra dependencies, but is limited to JVM heap memory.

In contrast, the '''EmbeddedRocksDBStateBackend''' stores its state in an embedded '''RocksDB''' instance. This state backend can store very large state that exceeds memory and spills to local disk. All key/value state (including windows) is stored in the key/value index of RocksDB.

In both cases, fault tolerance is managed via the jobs org.apache.flink.runtime.state.CheckpointStorage which configures how and where state backends persist during a checkpoint.

Attributes

Returns

This StreamExecutionEnvironment itself, to allow chaining of function calls.

See also

#getStateBackend()

@PublicEvolving
def socketTextStream(hostname: String, port: Int, delimiter: Char, maxRetry: Long): DataStream[String]

Creates a new DataStream that contains the strings received infinitely from socket. Received strings are decoded by the system's default character set. The maximum retry interval is specified in seconds, in case of temporary service outage reconnection is initiated every second.

Creates a new DataStream that contains the strings received infinitely from socket. Received strings are decoded by the system's default character set. The maximum retry interval is specified in seconds, in case of temporary service outage reconnection is initiated every second.

Attributes

Deprecated methods

def generateSequence(from: Long, to: Long): DataStream[Long]

Creates a new DataStream that contains a sequence of numbers. This source is a parallel source. If you manually set the parallelism to 1 the emitted elements are in order.

Creates a new DataStream that contains a sequence of numbers. This source is a parallel source. If you manually set the parallelism to 1 the emitted elements are in order.

Attributes

Deprecated

Use long) instead to create a new data stream that contains NumberSequenceSource.