Class SparkJob
- java.lang.Object
-
- org.openstreetmap.atlas.utilities.runtime.Command
-
- org.openstreetmap.atlas.generator.tools.spark.SparkJob
-
- All Implemented Interfaces:
java.io.Serializable
- Direct Known Subclasses:
AtlasGenerator,EmptySparkJob,ShardedSparkJob
public abstract class SparkJob extends org.openstreetmap.atlas.utilities.runtime.Command implements java.io.SerializableSkeleton for a Spark Job- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.openstreetmap.atlas.utilities.runtime.Command
org.openstreetmap.atlas.utilities.runtime.Command.Flag, org.openstreetmap.atlas.utilities.runtime.Command.Optionality, org.openstreetmap.atlas.utilities.runtime.Command.Switch<T extends java.lang.Object>, org.openstreetmap.atlas.utilities.runtime.Command.SwitchList
-
-
Field Summary
Fields Modifier and Type Field Description static org.openstreetmap.atlas.utilities.runtime.Command.Switch<java.util.Map<java.lang.String,java.lang.String>>ADDITIONAL_SPARK_OPTIONSstatic org.openstreetmap.atlas.utilities.runtime.Command.Switch<java.lang.String>COMPRESS_OUTPUTstatic java.lang.StringFAILED_FILEstatic org.openstreetmap.atlas.utilities.runtime.Command.Switch<java.lang.String>INPUTstatic org.openstreetmap.atlas.utilities.runtime.Command.Switch<java.lang.String>MASTERstatic org.openstreetmap.atlas.utilities.runtime.Command.Switch<java.lang.String>OUTPUTstatic java.lang.StringSAVING_SEPARATORstatic org.openstreetmap.atlas.utilities.runtime.Command.Switch<SparkContextProvider>SPARK_CONTEXT_PROVIDERstatic org.openstreetmap.atlas.utilities.runtime.Command.Switch<java.util.Map<java.lang.String,java.lang.String>>SPARK_OPTIONSstatic java.lang.StringSUCCESS_FILE
-
Constructor Summary
Constructors Constructor Description SparkJob()
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected org.apache.hadoop.conf.Configurationconfiguration()protected java.util.Map<java.lang.String,java.lang.String>configurationMap()protected voidcopyToOutput(org.openstreetmap.atlas.utilities.runtime.CommandMap command, java.lang.String input, java.lang.String output)protected java.lang.StringgetAlternateParallelFolderOutput(java.lang.String output, java.lang.String name)Get an alternate output based on the main output folder used for monitoringprotected java.lang.StringgetAlternateSubFolderOutput(java.lang.String output, java.lang.String name)Get an alternate output based on the main output folder used for monitoring.protected org.apache.spark.api.java.JavaSparkContextgetContext()abstract java.lang.StringgetName()protected java.lang.Stringinput(org.openstreetmap.atlas.utilities.runtime.CommandMap command)intonRun(org.openstreetmap.atlas.utilities.runtime.CommandMap command)protected java.lang.Stringoutput(org.openstreetmap.atlas.utilities.runtime.CommandMap command)protected java.util.List<java.lang.String>outputToClean(org.openstreetmap.atlas.utilities.runtime.CommandMap command)Define all the folders to clean before a run.protected org.openstreetmap.atlas.streaming.resource.Resourceresource(java.lang.String path)static org.openstreetmap.atlas.streaming.resource.Resourceresource(java.lang.String path, java.util.Map<java.lang.String,java.lang.String> configurationMap)protected voidsetContext(org.apache.spark.api.java.JavaSparkContext context)protected <T> voidsplitAndSaveAsHadoopFile(org.apache.spark.api.java.JavaPairRDD<java.lang.String,T> input, java.lang.String path, java.lang.Class<T> valueClass, java.lang.Class<? extends org.apache.hadoop.mapred.lib.MultipleOutputFormat<java.lang.String,T>> formatterClass, java.util.function.UnaryOperator<java.lang.String> keyReducer)Instead of saving a full RDD(String, T) in a single folder, this function allows to save subsets of an RDD(String, T) in separate folders.abstract voidstart(org.openstreetmap.atlas.utilities.runtime.CommandMap command)The spark Jobprotected org.openstreetmap.atlas.utilities.runtime.Command.SwitchListswitches()
-
-
-
Field Detail
-
INPUT
public static final org.openstreetmap.atlas.utilities.runtime.Command.Switch<java.lang.String> INPUT
-
OUTPUT
public static final org.openstreetmap.atlas.utilities.runtime.Command.Switch<java.lang.String> OUTPUT
-
MASTER
public static final org.openstreetmap.atlas.utilities.runtime.Command.Switch<java.lang.String> MASTER
-
SPARK_OPTIONS
public static final org.openstreetmap.atlas.utilities.runtime.Command.Switch<java.util.Map<java.lang.String,java.lang.String>> SPARK_OPTIONS
-
ADDITIONAL_SPARK_OPTIONS
public static final org.openstreetmap.atlas.utilities.runtime.Command.Switch<java.util.Map<java.lang.String,java.lang.String>> ADDITIONAL_SPARK_OPTIONS
-
COMPRESS_OUTPUT
public static final org.openstreetmap.atlas.utilities.runtime.Command.Switch<java.lang.String> COMPRESS_OUTPUT
-
SPARK_CONTEXT_PROVIDER
public static final org.openstreetmap.atlas.utilities.runtime.Command.Switch<SparkContextProvider> SPARK_CONTEXT_PROVIDER
-
SUCCESS_FILE
public static final java.lang.String SUCCESS_FILE
- See Also:
- Constant Field Values
-
FAILED_FILE
public static final java.lang.String FAILED_FILE
- See Also:
- Constant Field Values
-
SAVING_SEPARATOR
public static final java.lang.String SAVING_SEPARATOR
- See Also:
- Constant Field Values
-
-
Method Detail
-
resource
public static org.openstreetmap.atlas.streaming.resource.Resource resource(java.lang.String path, java.util.Map<java.lang.String,java.lang.String> configurationMap)
-
getName
public abstract java.lang.String getName()
- Returns:
- The name of the job
-
onRun
public int onRun(org.openstreetmap.atlas.utilities.runtime.CommandMap command)
- Specified by:
onRunin classorg.openstreetmap.atlas.utilities.runtime.Command
-
start
public abstract void start(org.openstreetmap.atlas.utilities.runtime.CommandMap command)
The spark Job- Parameters:
command- The arguments passed to the main method
-
configuration
protected org.apache.hadoop.conf.Configuration configuration()
-
configurationMap
protected java.util.Map<java.lang.String,java.lang.String> configurationMap()
-
copyToOutput
protected void copyToOutput(org.openstreetmap.atlas.utilities.runtime.CommandMap command, java.lang.String input, java.lang.String output)
-
getAlternateParallelFolderOutput
protected java.lang.String getAlternateParallelFolderOutput(java.lang.String output, java.lang.String name)Get an alternate output based on the main output folder used for monitoring- Parameters:
output- The main output foldername- The name of the alternate folder- Returns:
- The alternate output
-
getAlternateSubFolderOutput
protected java.lang.String getAlternateSubFolderOutput(java.lang.String output, java.lang.String name)Get an alternate output based on the main output folder used for monitoring. Use the sub-folder.- Parameters:
output- The main output foldername- The name of the alternate folder- Returns:
- The alternate output
-
getContext
protected org.apache.spark.api.java.JavaSparkContext getContext()
-
input
protected java.lang.String input(org.openstreetmap.atlas.utilities.runtime.CommandMap command)
-
output
protected java.lang.String output(org.openstreetmap.atlas.utilities.runtime.CommandMap command)
-
outputToClean
protected java.util.List<java.lang.String> outputToClean(org.openstreetmap.atlas.utilities.runtime.CommandMap command)
Define all the folders to clean before a run.- Parameters:
command- The command parameters sent to the main class.- Returns:
- All the paths to clean
-
resource
protected org.openstreetmap.atlas.streaming.resource.Resource resource(java.lang.String path)
- Parameters:
path- The path to open (in an URL format)- Returns:
- The resource at this path
-
setContext
protected void setContext(org.apache.spark.api.java.JavaSparkContext context)
-
splitAndSaveAsHadoopFile
protected <T> void splitAndSaveAsHadoopFile(org.apache.spark.api.java.JavaPairRDD<java.lang.String,T> input, java.lang.String path, java.lang.Class<T> valueClass, java.lang.Class<? extends org.apache.hadoop.mapred.lib.MultipleOutputFormat<java.lang.String,T>> formatterClass, java.util.function.UnaryOperator<java.lang.String> keyReducer)Instead of saving a full RDD(String, T) in a single folder, this function allows to save subsets of an RDD(String, T) in separate folders. The keyReducer function needs to provide the unique String by which each key string needs to be grouped with. For example an RDD with keys "aaa_1", "aaa_2", and "bbb_1" and a function that takes the first part of the key as a grouping key will be saved in two different folders. If the path is /path/to/output, then the two folders will be /path/to-aaa/ and /path/to-bbb/This function might be slow as it will generate a Spark stage for each category in this RDD. In the example above, it would create two stages. When the number of stages increases, it might be really slow.
- Type Parameters:
T- The type of the object to save- Parameters:
input- The RDD to savepath- The output path of the jobvalueClass- The type to save as Hadoop fileformatterClass- The corresponding Hadoop formatterkeyReducer- The key reducing function explained above.
-
switches
protected org.openstreetmap.atlas.utilities.runtime.Command.SwitchList switches()
- Specified by:
switchesin classorg.openstreetmap.atlas.utilities.runtime.Command
-
-