Class CJKBigramFilterFactory

  • All Implemented Interfaces:
    TokenFilterFactory, IndexComponent

    public final class CJKBigramFilterFactory
    extends AbstractTokenFilterFactory
    Factory that creates a CJKBigramFilter to form bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer.

    CJK types are set by these tokenizers, but you can also use flags to explicitly control which of the CJK scripts are turned into bigrams.

    By default, when a CJK character has no adjacent characters to form a bigram, it is output in unigram form. If you want to always output both unigrams and bigrams, set the outputUnigrams flag. This can be used for a combined unigram+bigram approach.

    In all cases, all non-CJK input is passed thru unmodified.

    • Method Detail

      • create

        public org.apache.lucene.analysis.TokenStream create​(org.apache.lucene.analysis.TokenStream tokenStream)