public abstract class Tokenizer extends Object
The Tokenizer uses a Dictionary to assist the decomposition of
strings into potential morphemes
| Modifier and Type | Field and Description |
|---|---|
protected Node |
bosNode
A
Node representing a beginning-of-string |
protected Dictionary |
dictionary
The
Dictionary used to find possible morphemes |
protected Node |
eosNode
A
Node representing an end-of-string |
protected CToken |
unknownCToken
A
CToken representing an unknown morpheme |
protected String |
unknownPartOfSpeechDescription
The part-of-speech code to use for unknown tokens
|
| Constructor and Description |
|---|
Tokenizer(Dictionary dictionary,
String unknownPartOfSpeechDescription)
Constructs a new
Tokenizer that uses the specified
Dictionary to find possible morphemes within a given string |
| Modifier and Type | Method and Description |
|---|---|
Node |
getBOSNode()
Creates a unique beginning-of-string
Node. |
Dictionary |
getDictionary() |
Node |
getEOSNode()
Creates a unique end-of-string
Node. |
Node |
getUnknownNode(char[] surface,
int start,
int length,
int span)
Creates an "unknown morpheme"
Node with the specified
characteristics. |
abstract Node |
lookup(SentenceIterator iterator,
char[] surface)
Searches for possible morphemes from the given SentenceIterator.
|
protected final Dictionary dictionary
Dictionary used to find possible morphemesprotected final String unknownPartOfSpeechDescription
public Tokenizer(Dictionary dictionary, String unknownPartOfSpeechDescription)
Tokenizer that uses the specified
Dictionary to find possible morphemes within a given stringdictionary - The Dictionary to search withinunknownPartOfSpeechDescription - The part-of-speech code to use for
unknown tokenspublic Dictionary getDictionary()
public Node getBOSNode()
Node. The Node
returned by this method is freshly cloned and not an alias of any
other NodeNodepublic Node getEOSNode()
Node. The Node returned by
this method is freshly cloned and not an alias of any other Nodepublic Node getUnknownNode(char[] surface, int start, int length, int span)
public abstract Node lookup(SentenceIterator iterator, char[] surface) throws IOException
Node that is returned links through
Node.rnext to a list of matches which may be of varying
lengthsiterator - The iterator to search fromsurface - The underlying character surfaceNodes representing the possible
morphemes beginning at the given indexIOException