public class GosenAnalyzer
extends org.apache.lucene.analysis.util.StopwordAnalyzerBase
| Constructor and Description |
|---|
GosenAnalyzer()
Create a GosenAnalyzer with the default stopwords and stoptags and no stemExclusionSet
|
GosenAnalyzer(org.apache.lucene.analysis.util.CharArraySet stopwords,
Set<String> stoptags,
org.apache.lucene.analysis.util.CharArraySet stemExclusionSet,
String dictionaryDir)
Create a GosenAnalyzer with the specified stopwords, stoptags, and stemExclusionSet
|
GosenAnalyzer(String dictionaryDir)
Create a GosenAnalyzer with the default stopwords and stoptags and no stemExclusionSet
and argument of dictionaryDir. |
| Modifier and Type | Method and Description |
|---|---|
protected org.apache.lucene.analysis.Analyzer.TokenStreamComponents |
createComponents(String field)
Creates
Analyzer.TokenStreamComponents
used to tokenize all the text in the provided Reader. |
static Set<?> |
getDefaultStopSet() |
static Set<String> |
getDefaultStopTags() |
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSetpublic GosenAnalyzer()
public GosenAnalyzer(String dictionaryDir)
public GosenAnalyzer(org.apache.lucene.analysis.util.CharArraySet stopwords,
Set<String> stoptags,
org.apache.lucene.analysis.util.CharArraySet stemExclusionSet,
String dictionaryDir)
stopwords - a stopword set: words matching these (Surfstoptags - a stoptags set: words containing these parts of speech will be removed from the stream.stemExclusionSet - a stemming exclusion set: these words are ignored by
GosenBasicFormFilter and GosenKatakanaStemFilterdictionaryDir - a directory of dictionarypublic static Set<?> getDefaultStopSet()
protected org.apache.lucene.analysis.Analyzer.TokenStreamComponents createComponents(String field)
Analyzer.TokenStreamComponents
used to tokenize all the text in the provided Reader.createComponents in class org.apache.lucene.analysis.AnalyzerAnalyzer.TokenStreamComponents
built from a GosenTokenizer filtered with
GosenWidthFilter, GosenPunctuationFilter,
GosenPartOfSpeechStopFilter, StopFilter,
SetKeywordMarkerFilter if a stem exclusion set is provided,
GosenBasicFormFilter, GosenKatakanaStemFilter,
and LowerCaseFilter