Class ConcatenateGraphTokenFilterFactory

All Implemented Interfaces:
TokenFilterFactory, IndexComponent

public class ConcatenateGraphTokenFilterFactory extends AbstractTokenFilterFactory
Factory for ConcatenateGraphFilter. Adopted from ConcatenateGraphFilterFactory, with some changes to default values: token_separator is a "space", preserve_position_increments is false to avoid duplicated separators, max_graph_expansions is 100 as the default value of 10_000 seems to be unnecessarily large and preserve_separator is false.
  • preserve_separator: For LegacyESVersion lesser than LegacyESVersion.V_7_6_0 i.e. lucene versions lesser than Version.LUCENE_8_4_0 Whether ConcatenateGraphFilter.SEP_LABEL should separate the input tokens in the concatenated token.
  • token_separator: Separator to use for concatenation. Must be a String with a single character or empty. If not present, DEFAULT_TOKEN_SEPARATOR will be used. If empty i.e. "", tokens will be concatenated without any separators.
  • preserve_position_increments: Whether to add an empty token for missing positions. If not present, DEFAULT_PRESERVE_POSITION_INCREMENTS will be used.
  • max_graph_expansions: If the tokenStream graph has more than this many possible paths through, then we'll throw TooComplexToDeterminizeException to preserve the stability and memory of the machine. If not present, DEFAULT_MAX_GRAPH_EXPANSIONS will be used.
See Also:
  • Field Details

    • DEFAULT_TOKEN_SEPARATOR

      public static final String DEFAULT_TOKEN_SEPARATOR
      See Also:
    • DEFAULT_MAX_GRAPH_EXPANSIONS

      public static final int DEFAULT_MAX_GRAPH_EXPANSIONS
      See Also:
    • DEFAULT_PRESERVE_POSITION_INCREMENTS

      public static final boolean DEFAULT_PRESERVE_POSITION_INCREMENTS
      See Also:
  • Method Details

    • create

      public org.apache.lucene.analysis.TokenStream create(org.apache.lucene.analysis.TokenStream tokenStream)