org.apache.hadoop.hive.ql.exec
Class ReduceSinkOperator

java.lang.Object
  extended by org.apache.hadoop.hive.ql.exec.Operator<T>
      extended by org.apache.hadoop.hive.ql.exec.TerminalOperator<ReduceSinkDesc>
          extended by org.apache.hadoop.hive.ql.exec.ReduceSinkOperator
All Implemented Interfaces:
Serializable, Cloneable, TopNHash.BinaryCollector, Node
Direct Known Subclasses:
VectorReduceSinkOperator

public class ReduceSinkOperator
extends TerminalOperator<ReduceSinkDesc>
implements Serializable, TopNHash.BinaryCollector

Reduce Sink Operator sends output to the reduce stage.

See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.hive.ql.exec.Operator
Operator.OperatorFunc, Operator.State
 
Field Summary
protected  ExprNodeEvaluator[] bucketEval
          Evaluators for bucketing columns.
protected  Object[][] cachedKeys
          This two dimensional array holds key data and a corresponding Union object which contains the tag identifying the aggregate expression for distinct columns.
protected  Object[] cachedValues
           
protected  List<List<Integer>> distinctColIndices
           
protected  ExprNodeEvaluator[] keyEval
          The evaluators for the key columns.
protected  boolean keyIsText
           
protected  ObjectInspector keyObjectInspector
           
protected  Serializer keySerializer
           
protected  HiveKey keyWritable
           
protected  int numDistinctExprs
           
protected  int numDistributionKeys
           
protected  org.apache.hadoop.mapred.OutputCollector out
           
protected  ExprNodeEvaluator[] partitionEval
          The evaluators for the partition columns (CLUSTER BY or DISTRIBUTE BY in Hive language).
protected  Random random
           
protected  TopNHash reducerHash
           
protected  byte[] tagByte
           
protected  ExprNodeEvaluator[] valueEval
          The evaluators for the value columns.
protected  ObjectInspector valueObjectInspector
           
protected  Serializer valueSerializer
           
 
Fields inherited from class org.apache.hadoop.hive.ql.exec.Operator
alias, childOperators, childOperatorsArray, childOperatorsTag, colExprMap, conf, done, groupKeyObject, groupKeyOI, HIVECOUNTERCREATEDFILES, HIVECOUNTERFATAL, id, inputObjInspectors, isLogInfoEnabled, LOG, operatorId, outputObjInspector, parentOperators, reporter, state, statsMap
 
Constructor Summary
ReduceSinkOperator()
           
 
Method Summary
protected  void closeOp(boolean abort)
          Operator specific close routine.
 void collect(byte[] key, byte[] value, int hash)
           
protected  void collect(org.apache.hadoop.io.BytesWritable keyWritable, org.apache.hadoop.io.Writable valueWritable)
           
 String[] getInputAliases()
           
 String getName()
          Implements the getName function for the Node Interface.
static String getOperatorName()
           
 OperatorType getType()
          Return the type of the specific operator among the types in OperatorType.
protected static StructObjectInspector initEvaluatorsAndReturnStruct(ExprNodeEvaluator[] evals, List<List<Integer>> distinctColIndices, List<String> outputColNames, int length, ObjectInspector rowInspector)
          Initializes array of ExprNodeEvaluator.
protected  void initializeOp(org.apache.hadoop.conf.Configuration hconf)
          Operator specific initialization.
 boolean opAllowedBeforeMapJoin()
           
 void processOp(Object row, int tag)
          Process the row.
 void setInputAliases(String[] inputAliases)
           
 void setOutputCollector(org.apache.hadoop.mapred.OutputCollector _out)
           
protected  HiveKey toHiveKey(Object obj, int tag, Integer distLength)
           
 
Methods inherited from class org.apache.hadoop.hive.ql.exec.Operator
acceptLimitPushdown, allInitializedParentsAreClosed, areAllParentsInitialized, augmentPlan, cleanUpInputFileChanged, cleanUpInputFileChangedOp, clone, cloneOp, cloneRecursiveChildren, close, columnNamesRowResolvedCanBeObtained, defaultEndGroup, defaultStartGroup, dump, dump, endGroup, flush, forward, getAdditionalCounters, getChildOperators, getChildren, getColumnExprMap, getConf, getConfiguration, getDone, getExecContext, getGroupKeyObject, getGroupKeyObjectInspector, getIdentifier, getInputObjInspectors, getNextCntr, getNumChild, getNumParent, getOperatorId, getOpTraits, getOutputObjInspector, getParentOperators, getSchema, getStatistics, getStats, initEvaluators, initEvaluators, initEvaluatorsAndReturnStruct, initialize, initialize, initializeChildren, initializeLocalWork, initOperatorId, isUseBucketizedHiveInputFormat, jobClose, jobCloseOp, logStats, opAllowedAfterMapJoin, opAllowedBeforeSortMergeJoin, opAllowedConvertMapJoin, passExecContext, preorderMap, processGroup, removeChild, removeChildAndAdoptItsChildren, removeChildren, removeParent, replaceChild, replaceParent, reset, resetId, resetStats, setAlias, setChildOperators, setColumnExprMap, setConf, setDone, setExecContext, setGroupKeyObject, setGroupKeyObjectInspector, setId, setInputObjInspectors, setOperatorId, setOpTraits, setParentOperators, setReporter, setSchema, setStatistics, setUseBucketizedHiveInputFormat, startGroup, supportAutomaticSortMergeJoin, supportSkewJoinOptimization, supportUnionRemoveOptimization, toString, toString
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

out

protected transient org.apache.hadoop.mapred.OutputCollector out

keyEval

protected transient ExprNodeEvaluator[] keyEval
The evaluators for the key columns. Key columns decide the sort order on the reducer side. Key columns are passed to the reducer in the "key".


valueEval

protected transient ExprNodeEvaluator[] valueEval
The evaluators for the value columns. Value columns are passed to reducer in the "value".


partitionEval

protected transient ExprNodeEvaluator[] partitionEval
The evaluators for the partition columns (CLUSTER BY or DISTRIBUTE BY in Hive language). Partition columns decide the reducer that the current row goes to. Partition columns are not passed to reducer.


bucketEval

protected transient ExprNodeEvaluator[] bucketEval
Evaluators for bucketing columns. This is used to compute bucket number.


keySerializer

protected transient Serializer keySerializer

keyIsText

protected transient boolean keyIsText

valueSerializer

protected transient Serializer valueSerializer

tagByte

protected transient byte[] tagByte

numDistributionKeys

protected transient int numDistributionKeys

numDistinctExprs

protected transient int numDistinctExprs

reducerHash

protected transient TopNHash reducerHash

keyWritable

protected transient HiveKey keyWritable

keyObjectInspector

protected transient ObjectInspector keyObjectInspector

valueObjectInspector

protected transient ObjectInspector valueObjectInspector

cachedValues

protected transient Object[] cachedValues

distinctColIndices

protected transient List<List<Integer>> distinctColIndices

cachedKeys

protected transient Object[][] cachedKeys
This two dimensional array holds key data and a corresponding Union object which contains the tag identifying the aggregate expression for distinct columns. If there is no distict expression, cachedKeys is simply like this. cachedKeys[0] = [col0][col1] with two distict expression, union(tag:key) is attatched for each distinct expression cachedKeys[0] = [col0][col1][0:dist1] cachedKeys[1] = [col0][col1][1:dist2] in this case, child GBY evaluates distict values with expression like KEY.col2:0.dist1 see ExprNodeColumnEvaluator


random

protected transient Random random
Constructor Detail

ReduceSinkOperator

public ReduceSinkOperator()
Method Detail

setInputAliases

public void setInputAliases(String[] inputAliases)

getInputAliases

public String[] getInputAliases()

setOutputCollector

public void setOutputCollector(org.apache.hadoop.mapred.OutputCollector _out)
Overrides:
setOutputCollector in class Operator<ReduceSinkDesc>

initializeOp

protected void initializeOp(org.apache.hadoop.conf.Configuration hconf)
                     throws HiveException
Description copied from class: Operator
Operator specific initialization.

Overrides:
initializeOp in class Operator<ReduceSinkDesc>
Throws:
HiveException

initEvaluatorsAndReturnStruct

protected static StructObjectInspector initEvaluatorsAndReturnStruct(ExprNodeEvaluator[] evals,
                                                                     List<List<Integer>> distinctColIndices,
                                                                     List<String> outputColNames,
                                                                     int length,
                                                                     ObjectInspector rowInspector)
                                                              throws HiveException
Initializes array of ExprNodeEvaluator. Adds Union field for distinct column indices for group by. Puts the return values into a StructObjectInspector with output column names. If distinctColIndices is empty, the object inspector is same as Operator.initEvaluatorsAndReturnStruct(ExprNodeEvaluator[], List, ObjectInspector)

Throws:
HiveException

processOp

public void processOp(Object row,
                      int tag)
               throws HiveException
Description copied from class: Operator
Process the row.

Specified by:
processOp in class Operator<ReduceSinkDesc>
Parameters:
row - The object representing the row.
tag - The tag of the row usually means which parent this row comes from. Rows with the same tag should have exactly the same rowInspector all the time.
Throws:
HiveException

toHiveKey

protected HiveKey toHiveKey(Object obj,
                            int tag,
                            Integer distLength)
                     throws SerDeException
Throws:
SerDeException

collect

public void collect(byte[] key,
                    byte[] value,
                    int hash)
             throws IOException
Specified by:
collect in interface TopNHash.BinaryCollector
Throws:
IOException

collect

protected void collect(org.apache.hadoop.io.BytesWritable keyWritable,
                       org.apache.hadoop.io.Writable valueWritable)
                throws IOException
Throws:
IOException

closeOp

protected void closeOp(boolean abort)
                throws HiveException
Description copied from class: Operator
Operator specific close routine. Operators which inherents this class should overwrite this funtion for their specific cleanup routine.

Overrides:
closeOp in class Operator<ReduceSinkDesc>
Throws:
HiveException

getName

public String getName()
Description copied from class: Operator
Implements the getName function for the Node Interface.

Specified by:
getName in interface Node
Overrides:
getName in class Operator<ReduceSinkDesc>
Returns:
the name of the operator

getOperatorName

public static String getOperatorName()

getType

public OperatorType getType()
Description copied from class: Operator
Return the type of the specific operator among the types in OperatorType.

Specified by:
getType in class Operator<ReduceSinkDesc>
Returns:
OperatorType.*

opAllowedBeforeMapJoin

public boolean opAllowedBeforeMapJoin()
Overrides:
opAllowedBeforeMapJoin in class Operator<ReduceSinkDesc>


Copyright © 2014 The Apache Software Foundation. All rights reserved.