Class RegexTokenizer
- java.lang.Object
-
- org.apache.commons.text.similarity.RegexTokenizer
-
- All Implemented Interfaces:
Tokenizer<CharSequence>
class RegexTokenizer extends Object implements Tokenizer<CharSequence>
A simple word tokenizer that utilizes regex to find words. It applies a regex(\w)+over the input text to extract words from a given character sequence.- Since:
- 1.0
-
-
Constructor Summary
Constructors Constructor Description RegexTokenizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description CharSequence[]tokenize(CharSequence text)Returns an array of tokens.
-
-
-
Method Detail
-
tokenize
public CharSequence[] tokenize(CharSequence text)
Returns an array of tokens.- Specified by:
tokenizein interfaceTokenizer<CharSequence>- Parameters:
text- input text- Returns:
- array of tokens
- Throws:
IllegalArgumentException- if the input text is blank
-
-