public class FastConvergeScramblingMethod extends ScramblingMethodBase
Outlier values: Bottom 0.1% and 99.9% percentile values for every numeric column. Suppose a table has 20 numeric columns; then, in the worst case, 20 * 0.2% = 4% of the tuples are the tuples containing outlier values.
Blocks for Tier 0: Tier 0 takes up to 50% of each block.
Blocks for Tier 1: Tier 0 + Tier 1 take up to 80% of each block.
Blocks for Tier 2: Tier 2 takes the rest of the space in each block.
| Modifier and Type | Field and Description |
|---|---|
static String |
MAIN_TABLE_SOURCE_ALIAS_NAME |
static String |
RIGHT_TABLE_SOURCE_ALIAS_NAME |
blockSize| Constructor and Description |
|---|
FastConvergeScramblingMethod(long blockSize,
String scratchpadSchemaName) |
FastConvergeScramblingMethod(long blockSize,
String scratchpadSchemaName,
String primaryColumnName) |
| Modifier and Type | Method and Description |
|---|---|
int |
getBlockCount() |
List<Double> |
getCumulativeProbabilityDistributionForTier(Map<String,Object> metaData,
int tier) |
String |
getMainTableAlias() |
AbstractRelation |
getScramblingSource(String originalSchema,
String originalTable,
Map<String,Object> metaData)
Returns the table that should be used in the final scrambling stage.
|
List<ExecutableNodeBase> |
getStatisticsNode(String oldSchemaName,
String oldTableName,
String columnMetaTokenKey,
String partitionMetaTokenKey)
Computes three nodes.
|
int |
getTierCount() |
List<UnnamedColumn> |
getTierExpressions(Map<String,Object> metaData) |
getStoredCumulativeProbabilityDistributionForTier, storeCumulativeProbabilityDistributionpublic static final String MAIN_TABLE_SOURCE_ALIAS_NAME
public static final String RIGHT_TABLE_SOURCE_ALIAS_NAME
public FastConvergeScramblingMethod(long blockSize,
String scratchpadSchemaName)
public List<ExecutableNodeBase> getStatisticsNode(String oldSchemaName, String oldTableName, String columnMetaTokenKey, String partitionMetaTokenKey)
Recall that channels 100 and 101 are reserved for column meta and partition meta, respectively.
This method generates up to three nodes, and the token keys set up by those nodes are: 1. queryResult: this contains avg, std, and count 2. schemaName, tableName: this is the name of the temporary tables that contains a list of large groups. 3. queryResult: this contains the sum of the sizes of large groups.
public List<UnnamedColumn> getTierExpressions(Map<String,Object> metaData)
public List<Double> getCumulativeProbabilityDistributionForTier(Map<String,Object> metaData, int tier)
tier - 0, 1, ..., getTierCount()-1public AbstractRelation getScramblingSource(String originalSchema, String originalTable, Map<String,Object> metaData)
ScramblingMethodpublic String getMainTableAlias()
public int getBlockCount()
public int getTierCount()
Copyright © 2018 University of Michigan. All rights reserved.