Package io.debezium.text
Class TokenStream.BasicTokenizer
- java.lang.Object
-
- io.debezium.text.TokenStream.BasicTokenizer
-
- All Implemented Interfaces:
TokenStream.Tokenizer
- Enclosing class:
- TokenStream
public static class TokenStream.BasicTokenizer extends Object implements TokenStream.Tokenizer
A basicTokenStream.Tokenizerimplementation that ignores whitespace but includes tokens for individual symbols, the period ('.'), single-quoted strings, double-quoted strings, whitespace-delimited words, and optionally comments.Note this Tokenizer may not be appropriate in many situations, but is provided merely as a convenience for those situations that happen to be able to use it.
-
-
Field Summary
Fields Modifier and Type Field Description static intCOMMENTThetoken typefor tokens that consist of all the characters between "/*" and "*/" or between "//" and the next line terminator (e.g., '\n', '\r' or "\r\n").static intDECIMALThetoken typefor tokens that consist of an individual '.' character.static intDOUBLE_QUOTED_STRINGThetoken typefor tokens that consist of all the characters within double-quotes.static intSINGLE_QUOTED_STRINGThetoken typefor tokens that consist of all the characters within single-quotes.static intSYMBOLThetoken typefor tokens that consist of an individual "symbol" character.private booleanuseCommentsstatic intWORDThetoken typefor tokens that represent an unquoted string containing a character sequence made up of non-whitespace and non-symbol characters.
-
Constructor Summary
Constructors Modifier Constructor Description protectedBasicTokenizer(boolean useComments)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidtokenize(TokenStream.CharacterStream input, TokenStream.Tokens tokens)Process the supplied characters and construct the appropriateTokenStream.Tokenobjects.
-
-
-
Field Detail
-
WORD
public static final int WORD
Thetoken typefor tokens that represent an unquoted string containing a character sequence made up of non-whitespace and non-symbol characters.- See Also:
- Constant Field Values
-
SYMBOL
public static final int SYMBOL
Thetoken typefor tokens that consist of an individual "symbol" character. The set of characters includes:-(){}*,;+%?$[]!<>|=:- See Also:
- Constant Field Values
-
DECIMAL
public static final int DECIMAL
Thetoken typefor tokens that consist of an individual '.' character.- See Also:
- Constant Field Values
-
SINGLE_QUOTED_STRING
public static final int SINGLE_QUOTED_STRING
Thetoken typefor tokens that consist of all the characters within single-quotes. Single quote characters are included if they are preceded (escaped) by a '\' character.- See Also:
- Constant Field Values
-
DOUBLE_QUOTED_STRING
public static final int DOUBLE_QUOTED_STRING
Thetoken typefor tokens that consist of all the characters within double-quotes. Double quote characters are included if they are preceded (escaped) by a '\' character.- See Also:
- Constant Field Values
-
COMMENT
public static final int COMMENT
Thetoken typefor tokens that consist of all the characters between "/*" and "*/" or between "//" and the next line terminator (e.g., '\n', '\r' or "\r\n").- See Also:
- Constant Field Values
-
useComments
private final boolean useComments
-
-
Method Detail
-
tokenize
public void tokenize(TokenStream.CharacterStream input, TokenStream.Tokens tokens) throws ParsingException
Description copied from interface:TokenStream.TokenizerProcess the supplied characters and construct the appropriateTokenStream.Tokenobjects.- Specified by:
tokenizein interfaceTokenStream.Tokenizer- Parameters:
input- the character input stream; never nulltokens- the factory forTokenStream.Tokenobjects, which records the order in which the tokens are created- Throws:
ParsingException- if there is an error while processing the character stream (e.g., a quote is not closed, etc.)
-
-