Package net.tlabs.tablesaw.parquet
Class TablesawParquetReadOptions.Builder
java.lang.Object
net.tlabs.tablesaw.parquet.TablesawParquetReadOptions.Builder
- Enclosing class:
- TablesawParquetReadOptions
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionallowDuplicateColumnNames(Boolean allow) build()columnTypes(Function<String, tech.tablesaw.api.ColumnType> columnTypeFunction) columnTypes(tech.tablesaw.api.ColumnType[] columnTypes) If used in conjuntion with thewithOnlyTheseColumns(String...)options, the provided ColumnType array must contain only the selected columns in the order they were provided.columnTypesPartial(Function<String, Optional<tech.tablesaw.api.ColumnType>> columnTypeFunction) columnTypesPartial(Map<String, tech.tablesaw.api.ColumnType> columnTypeByName) columnTypesToDetect(List<tech.tablesaw.api.ColumnType> columnTypesToDetect) This option can be used to select whether to use: ShortColumn or IntColumn for parquet short and byte columns.dateFormat(String dateFormat) Deprecated.dateFormat(DateTimeFormatter dateFormat) This option is not used by TablesawParquetReadOptionsdateTimeFormat(String dateTimeFormat) Deprecated.dateTimeFormat(DateTimeFormatter dateTimeFormat) This option is not used by TablesawParquetReadOptionsheader(boolean header) This option is not used by TablesawParquetReadOptionsignoreZeroDecimal(boolean ignoreZeroDecimal) This option is not used by TablesawParquetReadOptionsThis option is not used by TablesawParquetReadOptionsmaxCharsPerColumn(int maxCharsPerColumn) This option is not used by TablesawParquetReadOptionsmissingValueIndicator(String... missingValueIndicator) This option is not used by TablesawParquetReadOptionssample(boolean sample) This option is not used by TablesawParquetReadOptionsskipRowsWithInvalidColumnCount(boolean skipRowsWithInvalidColumnCount) timeFormat(String timeFormat) Deprecated.timeFormat(DateTimeFormatter timeFormat) This option is not used by TablesawParquetReadOptionswithConvertInt96ToTimestamp(boolean convertInt96ToTimestamp) Option to read parquet INT96 values as TimeStamp.withManageGroupAs(TablesawParquetReadOptions.ManageGroupsAs manageGroupsAs) Option for managing parquet groups (incl. repeats).withOnlyTheseColumns(String... columns) Read only a subset of columns, identified by name.withUnnanotatedBinaryAs(TablesawParquetReadOptions.UnnanotatedBinaryAs unnanotatedBinaryAs) Option for managing unnanotated parquet Binary.
-
Constructor Details
-
Builder
-
-
Method Details
-
build
-
header
This option is not used by TablesawParquetReadOptions -
tableName
-
sample
This option is not used by TablesawParquetReadOptions -
dateFormat
Deprecated.This option is not used by TablesawParquetReadOptions -
timeFormat
Deprecated.This option is not used by TablesawParquetReadOptions -
dateTimeFormat
Deprecated.This option is not used by TablesawParquetReadOptions -
dateFormat
This option is not used by TablesawParquetReadOptions -
timeFormat
This option is not used by TablesawParquetReadOptions -
dateTimeFormat
This option is not used by TablesawParquetReadOptions -
maxCharsPerColumn
This option is not used by TablesawParquetReadOptions -
locale
This option is not used by TablesawParquetReadOptions -
missingValueIndicator
This option is not used by TablesawParquetReadOptions -
columnTypesToDetect
public TablesawParquetReadOptions.Builder columnTypesToDetect(List<tech.tablesaw.api.ColumnType> columnTypesToDetect) This option can be used to select whether to use: ShortColumn or IntColumn for parquet short and byte columns. FloatColumn or DoubleColumn for parquet float columns. If the list does not contain ColumnType.SHORT, an IntColumn will be used for parquet short and byte columns. If the list does not contain ColumnType.FLOAT, a DoubleColumn will be used for parquet float columns.- Parameters:
columnTypesToDetect- only checked for presence of ColumnType.SHORT and ColumnType.FLOAT- Returns:
- this builder
-
minimizeColumnSizes
-
ignoreZeroDecimal
This option is not used by TablesawParquetReadOptions -
allowDuplicateColumnNames
-
skipRowsWithInvalidColumnCount
public TablesawParquetReadOptions.Builder skipRowsWithInvalidColumnCount(boolean skipRowsWithInvalidColumnCount) -
columnTypes
If used in conjuntion with thewithOnlyTheseColumns(String...)options, the provided ColumnType array must contain only the selected columns in the order they were provided. -
columnTypes
public TablesawParquetReadOptions.Builder columnTypes(Function<String, tech.tablesaw.api.ColumnType> columnTypeFunction) -
columnTypesPartial
public TablesawParquetReadOptions.Builder columnTypesPartial(Function<String, Optional<tech.tablesaw.api.ColumnType>> columnTypeFunction) -
columnTypesPartial
public TablesawParquetReadOptions.Builder columnTypesPartial(Map<String, tech.tablesaw.api.ColumnType> columnTypeByName) -
withConvertInt96ToTimestamp
public TablesawParquetReadOptions.Builder withConvertInt96ToTimestamp(boolean convertInt96ToTimestamp) Option to read parquet INT96 values as TimeStamp. False by default.- Parameters:
convertInt96ToTimestamp- set to true to read parquet INT96 values as TimeStamp, false to read as String.- Returns:
- this builder
-
withUnnanotatedBinaryAs
public TablesawParquetReadOptions.Builder withUnnanotatedBinaryAs(TablesawParquetReadOptions.UnnanotatedBinaryAs unnanotatedBinaryAs) Option for managing unnanotated parquet Binary. With UnnanotatedBinaryAs.STRING, these binaries are converted to UTF-8 Strings. With UnnanotatedBinaryAs.HEXSTRING, these binaries are converted to hexadecimal Strings. With UnnanotatedBinaryAs.SKIP, these fields are skipped.- Parameters:
unnanotatedBinaryAs- the UnnanotatedBinaryAs option- Returns:
- this builder
-
withManageGroupAs
public TablesawParquetReadOptions.Builder withManageGroupAs(TablesawParquetReadOptions.ManageGroupsAs manageGroupsAs) Option for managing parquet groups (incl. repeats). With ManageGroupsAs.TEXT, groups are converted to String columns (default behavior). With ManageGroupsAs.SKIP, groups are ignored. With ManageGroupsAs.ERROR, reading a parquet file containing groups will throw an exception.- Parameters:
manageGroupsAs- the ManageGroupsAs option- Returns:
- this builder
-
withOnlyTheseColumns
Read only a subset of columns, identified by name. If used with thecolumnTypes(ColumnType[])option, the ColumnType array must contain only the selected columns in the order they were provided.- Parameters:
columns- the column names to read- Returns:
- this builder
-