| Class | Description |
|---|---|
| GosenAnalyzer |
Analyzer for Japanese which uses "Sen" morphological analyzer.
|
| GosenBasicFormFilter |
Replaces term text with the
BasicFormAttribute. |
| GosenKatakanaStemFilter |
Convert a katakana word to a normalized form by stemming KATAKANA-HIRAGANA
PROLONGED SOUND MARK (U+30FC) which exists at the last of the string.
|
| GosenPartOfSpeechKeepFilter |
Removes tokens that do NOT match a set of POS tags.
|
| GosenPartOfSpeechStopFilter |
Removes tokens that match a set of POS tags.
|
| GosenPunctuationFilter |
Removes punctuation tokens
|
| GosenReadingsFormFilter |
Replaces term text with the
ReadingsAttribute. |
| GosenTokenizer |
This is a Japanese tokenizer which uses "Sen" morphological
analyzer.
|
| GosenWidthFilter |
A
TokenFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
|
| StreamTagger2 |
Breaks text into sentences according to UAX #29: Unicode Text Segmentation
(http://www.unicode.org/reports/tr29/)
|
| ToStringUtil |