public class PartitionTap extends cascading.tap.partition.BasePartitionTap<Configuration,RecordReader,OutputCollector>
Tuple instance.
The constructor takes a Hfs Tap and a Partition
implementation. This allows Tuple values at given positions to be used as directory names during write
operations, and directory names as data during read operations.
The key value here is that there is no need to duplicate data values in the directory names and inside the data files.
So only values declared in the parent Tap will be read or written to the underlying file system files. But
fields declared by the Partition will only be read or written to the directory names. That is, the
PartitionTap instance will sink or source the partition fields, plus the parent Tap fields. The partition
fields and parent Tap fields do not need to have common field names.
Note that Hadoop can only sink to directories, and all files in those directories are "part-xxxxx" files.
openWritesThreshold limits the number of open files to be output to. This value defaults to 300 files.
Each time the threshold is exceeded, 10% of the least recently used open files will be closed.
PartitionTap will populate a given partition without regard to case of the values being used. Thus
the resulting paths 2012/June/ and 2012/june/ will likely result in two open files into the same
location. Forcing the case to be consistent with a custom Partition implementation or an upstream
Function is recommended, see cascading.operation.expression.ExpressionFunction.
Though Hadoop has no mechanism to prevent simultaneous writes to a directory from multiple jobs, it doesn't mean its safe to do so. Same is true with the PartitionTap. Interleaving writes to a common parent (root) directory across multiple flows will very likely lead to data loss.
| Constructor and Description |
|---|
PartitionTap(Hfs parent,
cascading.tap.partition.Partition partition)
Constructor PartitionTap creates a new PartitionTap instance using the given parent
Hfs Tap as the
base path and default Scheme, and the partition. |
PartitionTap(Hfs parent,
cascading.tap.partition.Partition partition,
int openWritesThreshold)
Constructor PartitionTap creates a new PartitionTap instance using the given parent
Hfs Tap as the
base path and default Scheme, and the partition. |
PartitionTap(Hfs parent,
cascading.tap.partition.Partition partition,
cascading.tap.SinkMode sinkMode)
Constructor PartitionTap creates a new PartitionTap instance using the given parent
Hfs Tap as the
base path and default Scheme, and the partition. |
PartitionTap(Hfs parent,
cascading.tap.partition.Partition partition,
cascading.tap.SinkMode sinkMode,
boolean keepParentOnDelete)
Constructor PartitionTap creates a new PartitionTap instance using the given parent
Hfs Tap as the
base path and default Scheme, and the partition. |
PartitionTap(Hfs parent,
cascading.tap.partition.Partition partition,
cascading.tap.SinkMode sinkMode,
boolean keepParentOnDelete,
int openWritesThreshold)
Constructor PartitionTap creates a new PartitionTap instance using the given parent
Hfs Tap as the
base path and default Scheme, and the partition. |
| Modifier and Type | Method and Description |
|---|---|
boolean |
commitResource(Configuration conf) |
protected cascading.tuple.TupleEntrySchemeCollector |
createTupleEntrySchemeCollector(cascading.flow.FlowProcess<? extends Configuration> flowProcess,
cascading.tap.Tap parent,
java.lang.String path,
long sequence) |
protected cascading.tuple.TupleEntrySchemeIterator |
createTupleEntrySchemeIterator(cascading.flow.FlowProcess<? extends Configuration> flowProcess,
cascading.tap.Tap parent,
java.lang.String path,
RecordReader recordReader) |
protected java.lang.String |
getCurrentIdentifier(cascading.flow.FlowProcess<? extends Configuration> flowProcess) |
cascading.tuple.TupleEntryIterator |
openForRead(cascading.flow.FlowProcess<? extends Configuration> flowProcess,
RecordReader input) |
void |
sourceConfInit(cascading.flow.FlowProcess<? extends Configuration> flowProcess,
Configuration conf) |
addSourcePartitionFilter, castFileType, createResource, deleteResource, equals, getChildIdentifiers, getChildIdentifiers, getChildIdentifiers, getChildIdentifiers, getChildPartitionIdentifiers, getFilteredPartitionIdentifiers, getIdentifier, getModifiedTime, getOpenWritesThreshold, getParent, getPartition, getSize, getSize, hashCode, isDirectory, isDirectory, openForWrite, prepareResourceForRead, prepareResourceForWrite, resourceExists, rollbackResource, toStringcreateResource, deleteResource, entryStream, entryStream, entryStreamCopy, entryStreamCopy, flowConfInit, getConfigDef, getFullIdentifier, getFullIdentifier, getModifiedTime, getNodeConfigDef, getScheme, getSinkFields, getSinkMode, getSourceFields, getStepConfigDef, getTrace, hasConfigDef, hasNodeConfigDef, hasStepConfigDef, id, isKeep, isReplace, isSink, isSource, isTemporary, isUpdate, openForRead, openForReadUnchecked, openForWrite, outgoingScopeFor, presentSinkFields, presentSourceFields, resolveIncomingOperationArgumentFields, resolveIncomingOperationPassThroughFields, resourceExists, retrieveSinkFields, retrieveSourceFields, setScheme, sinkConfInit, spliterator, splititerator, taps, tupleStream, tupleStream, tupleStreamCopy, tupleStreamCopy@ConstructorProperties(value={"parent","partition"})
public PartitionTap(Hfs parent,
cascading.tap.partition.Partition partition)
Hfs Tap as the
base path and default Scheme, and the partition.parent - of type Tappartition - of type Partition@ConstructorProperties(value={"parent","partition","openWritesThreshold"})
public PartitionTap(Hfs parent,
cascading.tap.partition.Partition partition,
int openWritesThreshold)
Hfs Tap as the
base path and default Scheme, and the partition.
openWritesThreshold limits the number of open files to be output to.
parent - of type Hfspartition - of type PartitionopenWritesThreshold - of type int@ConstructorProperties(value={"parent","partition","sinkMode"})
public PartitionTap(Hfs parent,
cascading.tap.partition.Partition partition,
cascading.tap.SinkMode sinkMode)
Hfs Tap as the
base path and default Scheme, and the partition.parent - of type Tappartition - of type StringsinkMode - of type SinkMode@ConstructorProperties(value={"parent","partition","sinkMode","keepParentOnDelete"})
public PartitionTap(Hfs parent,
cascading.tap.partition.Partition partition,
cascading.tap.SinkMode sinkMode,
boolean keepParentOnDelete)
Hfs Tap as the
base path and default Scheme, and the partition.
keepParentOnDelete, when set to true, prevents the parent Tap from being deleted when BasePartitionTap.deleteResource(Object)
is called, typically an issue when used inside a Cascade.
parent - of type Tappartition - of type PartitionsinkMode - of type SinkModekeepParentOnDelete - of type boolean@ConstructorProperties(value={"parent","partition","sinkMode","keepParentOnDelete","openWritesThreshold"})
public PartitionTap(Hfs parent,
cascading.tap.partition.Partition partition,
cascading.tap.SinkMode sinkMode,
boolean keepParentOnDelete,
int openWritesThreshold)
Hfs Tap as the
base path and default Scheme, and the partition.
keepParentOnDelete, when set to true, prevents the parent Tap from being deleted when BasePartitionTap.deleteResource(Object)
is called, typically an issue when used inside a Cascade.
openWritesThreshold limits the number of open files to be output to.
parent - of type Tappartition - of type PartitionsinkMode - of type SinkModekeepParentOnDelete - of type booleanopenWritesThreshold - of type intprotected cascading.tuple.TupleEntrySchemeCollector createTupleEntrySchemeCollector(cascading.flow.FlowProcess<? extends Configuration> flowProcess, cascading.tap.Tap parent, java.lang.String path, long sequence) throws java.io.IOException
createTupleEntrySchemeCollector in class cascading.tap.partition.BasePartitionTap<Configuration,RecordReader,OutputCollector>java.io.IOExceptionprotected cascading.tuple.TupleEntrySchemeIterator createTupleEntrySchemeIterator(cascading.flow.FlowProcess<? extends Configuration> flowProcess, cascading.tap.Tap parent, java.lang.String path, RecordReader recordReader) throws java.io.IOException
createTupleEntrySchemeIterator in class cascading.tap.partition.BasePartitionTap<Configuration,RecordReader,OutputCollector>java.io.IOExceptionprotected java.lang.String getCurrentIdentifier(cascading.flow.FlowProcess<? extends Configuration> flowProcess)
getCurrentIdentifier in class cascading.tap.partition.BasePartitionTap<Configuration,RecordReader,OutputCollector>public void sourceConfInit(cascading.flow.FlowProcess<? extends Configuration> flowProcess, Configuration conf)
sourceConfInit in class cascading.tap.Tap<Configuration,RecordReader,OutputCollector>public cascading.tuple.TupleEntryIterator openForRead(cascading.flow.FlowProcess<? extends Configuration> flowProcess, RecordReader input) throws java.io.IOException
openForRead in class cascading.tap.partition.BasePartitionTap<Configuration,RecordReader,OutputCollector>java.io.IOExceptionpublic boolean commitResource(Configuration conf) throws java.io.IOException
commitResource in class cascading.tap.partition.BasePartitionTap<Configuration,RecordReader,OutputCollector>java.io.IOExceptionCopyright © 2007-2021 Cascading Maintainers. All Rights Reserved.