Package net.tlabs.tablesaw.parquet
Class TablesawParquetReadOptions.Builder
- java.lang.Object
-
- net.tlabs.tablesaw.parquet.TablesawParquetReadOptions.Builder
-
- Enclosing class:
- TablesawParquetReadOptions
public static class TablesawParquetReadOptions.Builder extends Object
-
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description TablesawParquetReadOptions.BuilderallowDuplicateColumnNames(Boolean allow)TablesawParquetReadOptionsbuild()TablesawParquetReadOptions.BuildercolumnTypes(Function<String,tech.tablesaw.api.ColumnType> columnTypeFunction)TablesawParquetReadOptions.BuildercolumnTypes(tech.tablesaw.api.ColumnType[] columnTypes)If used in conjuntion with thewithOnlyTheseColumns(String...)options, the provided ColumnType array must contain only the selected columns in the order they were provided.TablesawParquetReadOptions.BuildercolumnTypesPartial(Function<String,Optional<tech.tablesaw.api.ColumnType>> columnTypeFunction)TablesawParquetReadOptions.BuildercolumnTypesPartial(Map<String,tech.tablesaw.api.ColumnType> columnTypeByName)TablesawParquetReadOptions.BuildercolumnTypesToDetect(List<tech.tablesaw.api.ColumnType> columnTypesToDetect)This option can be used to select whether to use: ShortColumn or IntColumn for parquet short and byte columns.TablesawParquetReadOptions.BuilderdateFormat(String dateFormat)Deprecated.TablesawParquetReadOptions.BuilderdateFormat(DateTimeFormatter dateFormat)This option is not used by TablesawParquetReadOptionsTablesawParquetReadOptions.BuilderdateTimeFormat(String dateTimeFormat)Deprecated.TablesawParquetReadOptions.BuilderdateTimeFormat(DateTimeFormatter dateTimeFormat)This option is not used by TablesawParquetReadOptionsTablesawParquetReadOptions.Builderheader(boolean header)This option is not used by TablesawParquetReadOptionsTablesawParquetReadOptions.BuilderignoreZeroDecimal(boolean ignoreZeroDecimal)This option is not used by TablesawParquetReadOptionsTablesawParquetReadOptions.Builderlocale(Locale locale)This option is not used by TablesawParquetReadOptionsTablesawParquetReadOptions.BuildermaxCharsPerColumn(int maxCharsPerColumn)This option is not used by TablesawParquetReadOptionsTablesawParquetReadOptions.BuilderminimizeColumnSizes()TablesawParquetReadOptions.BuildermissingValueIndicator(String... missingValueIndicator)This option is not used by TablesawParquetReadOptionsTablesawParquetReadOptions.Buildersample(boolean sample)This option is not used by TablesawParquetReadOptionsTablesawParquetReadOptions.BuilderskipRowsWithInvalidColumnCount(boolean skipRowsWithInvalidColumnCount)TablesawParquetReadOptions.BuildertableName(String tableName)TablesawParquetReadOptions.BuildertimeFormat(String timeFormat)Deprecated.TablesawParquetReadOptions.BuildertimeFormat(DateTimeFormatter timeFormat)This option is not used by TablesawParquetReadOptionsTablesawParquetReadOptions.BuilderwithConvertInt96ToTimestamp(boolean convertInt96ToTimestamp)Option to read parquet INT96 values as TimeStamp.TablesawParquetReadOptions.BuilderwithManageGroupAs(TablesawParquetReadOptions.ManageGroupsAs manageGroupsAs)Option for managing parquet groups (incl. repeats).TablesawParquetReadOptions.BuilderwithOnlyTheseColumns(String... columns)Read only a subset of columns, identified by name.TablesawParquetReadOptions.BuilderwithUnnanotatedBinaryAs(TablesawParquetReadOptions.UnnanotatedBinaryAs unnanotatedBinaryAs)Option for managing unnanotated parquet Binary.
-
-
-
Constructor Detail
-
Builder
protected Builder(String inputPath)
-
-
Method Detail
-
build
public TablesawParquetReadOptions build()
-
header
public TablesawParquetReadOptions.Builder header(boolean header)
This option is not used by TablesawParquetReadOptions
-
tableName
public TablesawParquetReadOptions.Builder tableName(String tableName)
-
sample
public TablesawParquetReadOptions.Builder sample(boolean sample)
This option is not used by TablesawParquetReadOptions
-
dateFormat
@Deprecated public TablesawParquetReadOptions.Builder dateFormat(String dateFormat)
Deprecated.This option is not used by TablesawParquetReadOptions
-
timeFormat
@Deprecated public TablesawParquetReadOptions.Builder timeFormat(String timeFormat)
Deprecated.This option is not used by TablesawParquetReadOptions
-
dateTimeFormat
@Deprecated public TablesawParquetReadOptions.Builder dateTimeFormat(String dateTimeFormat)
Deprecated.This option is not used by TablesawParquetReadOptions
-
dateFormat
public TablesawParquetReadOptions.Builder dateFormat(DateTimeFormatter dateFormat)
This option is not used by TablesawParquetReadOptions
-
timeFormat
public TablesawParquetReadOptions.Builder timeFormat(DateTimeFormatter timeFormat)
This option is not used by TablesawParquetReadOptions
-
dateTimeFormat
public TablesawParquetReadOptions.Builder dateTimeFormat(DateTimeFormatter dateTimeFormat)
This option is not used by TablesawParquetReadOptions
-
maxCharsPerColumn
public TablesawParquetReadOptions.Builder maxCharsPerColumn(int maxCharsPerColumn)
This option is not used by TablesawParquetReadOptions
-
locale
public TablesawParquetReadOptions.Builder locale(Locale locale)
This option is not used by TablesawParquetReadOptions
-
missingValueIndicator
public TablesawParquetReadOptions.Builder missingValueIndicator(String... missingValueIndicator)
This option is not used by TablesawParquetReadOptions
-
columnTypesToDetect
public TablesawParquetReadOptions.Builder columnTypesToDetect(List<tech.tablesaw.api.ColumnType> columnTypesToDetect)
This option can be used to select whether to use: ShortColumn or IntColumn for parquet short and byte columns. FloatColumn or DoubleColumn for parquet float columns. If the list does not contain ColumnType.SHORT, an IntColumn will be used for parquet short and byte columns. If the list does not contain ColumnType.FLOAT, a DoubleColumn will be used for parquet float columns.- Parameters:
columnTypesToDetect- only checked for presence of ColumnType.SHORT and ColumnType.FLOAT- Returns:
- this builder
-
minimizeColumnSizes
public TablesawParquetReadOptions.Builder minimizeColumnSizes()
-
ignoreZeroDecimal
public TablesawParquetReadOptions.Builder ignoreZeroDecimal(boolean ignoreZeroDecimal)
This option is not used by TablesawParquetReadOptions
-
allowDuplicateColumnNames
public TablesawParquetReadOptions.Builder allowDuplicateColumnNames(Boolean allow)
-
skipRowsWithInvalidColumnCount
public TablesawParquetReadOptions.Builder skipRowsWithInvalidColumnCount(boolean skipRowsWithInvalidColumnCount)
-
columnTypes
public TablesawParquetReadOptions.Builder columnTypes(tech.tablesaw.api.ColumnType[] columnTypes)
If used in conjuntion with thewithOnlyTheseColumns(String...)options, the provided ColumnType array must contain only the selected columns in the order they were provided.
-
columnTypes
public TablesawParquetReadOptions.Builder columnTypes(Function<String,tech.tablesaw.api.ColumnType> columnTypeFunction)
-
columnTypesPartial
public TablesawParquetReadOptions.Builder columnTypesPartial(Function<String,Optional<tech.tablesaw.api.ColumnType>> columnTypeFunction)
-
columnTypesPartial
public TablesawParquetReadOptions.Builder columnTypesPartial(Map<String,tech.tablesaw.api.ColumnType> columnTypeByName)
-
withConvertInt96ToTimestamp
public TablesawParquetReadOptions.Builder withConvertInt96ToTimestamp(boolean convertInt96ToTimestamp)
Option to read parquet INT96 values as TimeStamp. False by default.- Parameters:
convertInt96ToTimestamp- set to true to read parquet INT96 values as TimeStamp, false to read as String.- Returns:
- this builder
-
withUnnanotatedBinaryAs
public TablesawParquetReadOptions.Builder withUnnanotatedBinaryAs(TablesawParquetReadOptions.UnnanotatedBinaryAs unnanotatedBinaryAs)
Option for managing unnanotated parquet Binary. With UnnanotatedBinaryAs.STRING, these binaries are converted to UTF-8 Strings. With UnnanotatedBinaryAs.HEXSTRING, these binaries are converted to hexadecimal Strings. With UnnanotatedBinaryAs.SKIP, these fields are skipped.- Parameters:
unnanotatedBinaryAs- the UnnanotatedBinaryAs option- Returns:
- this builder
-
withManageGroupAs
public TablesawParquetReadOptions.Builder withManageGroupAs(TablesawParquetReadOptions.ManageGroupsAs manageGroupsAs)
Option for managing parquet groups (incl. repeats). With ManageGroupsAs.TEXT, groups are converted to String columns (default behavior). With ManageGroupsAs.SKIP, groups are ignored. With ManageGroupsAs.ERROR, reading a parquet file containing groups will throw an exception.- Parameters:
manageGroupsAs- the ManageGroupsAs option- Returns:
- this builder
-
withOnlyTheseColumns
public TablesawParquetReadOptions.Builder withOnlyTheseColumns(String... columns)
Read only a subset of columns, identified by name. If used with thecolumnTypes(ColumnType[])option, the ColumnType array must contain only the selected columns in the order they were provided.- Parameters:
columns- the column names to read- Returns:
- this builder
-
-