Package org.opensearch.analysis.common
Class CJKBigramFilterFactory
java.lang.Object
org.opensearch.index.AbstractIndexComponent
org.opensearch.index.analysis.AbstractTokenFilterFactory
org.opensearch.analysis.common.CJKBigramFilterFactory
- All Implemented Interfaces:
TokenFilterFactory,IndexComponent
Factory that creates a
CJKBigramFilter to form bigrams of CJK terms
that are generated from StandardTokenizer or ICUTokenizer.
CJK types are set by these tokenizers, but you can also use flags to explicitly control which of the CJK scripts are turned into bigrams.
By default, when a CJK character has no adjacent characters to form a bigram,
it is output in unigram form. If you want to always output both unigrams and
bigrams, set the outputUnigrams flag. This can be used for a
combined unigram+bigram approach.
In all cases, all non-CJK input is passed thru unmodified.
-
Field Summary
Fields inherited from class org.opensearch.index.AbstractIndexComponent
deprecationLogger, indexSettings, loggerFields inherited from interface org.opensearch.index.analysis.TokenFilterFactory
IDENTITY_FILTER -
Method Summary
Modifier and TypeMethodDescriptionorg.apache.lucene.analysis.TokenStreamcreate(org.apache.lucene.analysis.TokenStream tokenStream) Methods inherited from class org.opensearch.index.analysis.AbstractTokenFilterFactory
nameMethods inherited from class org.opensearch.index.AbstractIndexComponent
getIndexSettings, indexMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.opensearch.index.analysis.TokenFilterFactory
breaksFastVectorHighlighter, getAnalysisMode, getChainAwareTokenFilterFactory, normalize
-
Method Details
-
create
public org.apache.lucene.analysis.TokenStream create(org.apache.lucene.analysis.TokenStream tokenStream) -
getSynonymFilter
-