@NotThreadSafe public class TokenStream extends Object
TokenStream object literally represents the stream of TokenStream.Token objects that each represent a word, symbol, comment
or other lexically-relevant piece of information. This simple framework makes it very easy to create a parser that walks
through (or "consumes") the tokens in the order they appear and do something useful with that content (usually creating another
representation of the content, such as some domain-specific Abstract Syntax Tree or object model).
This simple framework consists of a couple of pieces that fit together to do the whole job of parsing input content.
The TokenStream.Tokenizer is responsible for consuming the character-level input content and constructing TokenStream.Token objects for
the different words, symbols, or other meaningful elements contained in the content. Each Token object is a simple object that
records the character(s) that make up the token's value, but it does this in a very lightweight and efficient way by pointing
to the original character stream. Each token can be assigned a parser-specific integral token type that may make it
easier to do quickly figure out later in the process what kind of information each token represents. The general idea is to
keep the Tokenizer logic very simple, and very often TokenStream.Tokenizers will merely look for the different kinds of characters
(e.g., symbols, letters, digits, etc.) as well as things like quoted strings and comments. However, TokenStream.Tokenizers are
never called by the parser, but instead are always given to the TokenStream that then calls the Tokenizer at the appropriate
time.
The TokenStream is supplied the input content, a Tokenizer implementation, and a few options. Its job is to prepare the
content for processing, call the Tokenizer implementation to create the series of Token objects, and then provide an interface
for walking through and consuming the tokens. This interface makes it possible to discover the value and type of the current
token, and consume the current token and move to the next token. Plus, the interface has been designed to make the code that
works with the tokens to be as readable as possible.
The final component in this framework is the Parser. The parser is really any class that takes as input the content to be parsed and that outputs some meaningful information. The parser will do this by defining the Tokenizer, constructing a TokenStream object, and then using the TokenStream to walk through the sequence of Tokens and produce some meaningful representation of the content. Parsers can create instances of some object model, or they can create a domain-specific Abstract Syntax Tree representation.
The benefit of breaking the responsibility along these lines is that the TokenStream implementation is able to encapsulate quite a bit of very tedious and very useful functionality, while still allowing a lot of flexibility as to what makes up the different tokens. It also makes the parser very easy to write and read (and thus maintain), without placing very many restrictions on how that logic is to be defined. Plus, because the TokenStream takes responsibility for tracking the positions of every token (including line and column numbers), it can automatically produce meaningful errors.
A parser works with the tokens on the TokenStream using a variety of methods:
start() method must be called before any of the other methods. It performs initialization and tokenization,
and
prepares the internal state by finding the first token and setting an internal current token reference.hasNext() method can be called repeatedly to determine if there is another token after the current
token. This is often useful when an unknown number of tokens is to be processed, and behaves very similarly to the
Iterator.hasNext() method.consume() method returns the value of the current token and moves the current
token pointer to the next available token.consume(String) and consume(char) methods look at the current token and ensure the token's
value matches the value supplied as a method parameter, or they throw a ParsingException if the
values don't match. The consume(int) method works similarly, except that it attempts to match the token's
type. And, the consume(String, String...) is a convenience method that is equivalent to calling
consume(String) for each of the arguments.canConsume(String) and canConsume(char) methods look at the current token and check whether
the token's value matches the value supplied as a method parameter. If there is a match, the method
advances the current token reference and returns true. Otherwise, the current token does not match and the method
returns false without advancing the current token reference or throwing a ParsingException. Similarly, the
canConsume(int) method checks the token's type rather than the value, consuming the token and
returning true if there is a match, or just returning false if there is no match. The canConsume(String, String...)
method determines whether all of the supplied values can be consumed in the given order.matches(String) and matches(char) methods look at the current token and check whether the
token's value matches the value supplied as a method parameter. The method then returns whether there was
a match, but does not advance the current token pointer. Similarly, the matches(int) method checks the
token's type rather than the value. The matches(String, String...) method is a convenience method
that is equivalent to calling matches(String) for each of the arguments, and the matches(int, int...) method
is a convenience method that is equivalent to calling matches(int) for each of the arguments.matchesAnyOf(String, String...) methods look at the current token and check whether the token's
value matches at least one of the values supplied as method parameters. The method then returns whether
there was a match, but does not advance the current token pointer. Similarly, the
matchesAnyOf(int, int...) method checks the token's type rather than the value.With these methods, it's very easy to create a parser that looks at the current token to decide what to do, and then consume that token, and repeat this process.
Here is an example of a very simple parser that parses very simple and limited SQL SELECT and DELETE
statements, such as SELECT * FROM Customers or
SELECT Name, StreetAddress AS Address, City, Zip FROM Customers or
DELETE FROM Customers WHERE Zip=12345:
public class SampleSqlSelectParser {
public List<Statement> parse( String ddl ) {
TokenStream tokens = new TokenStream(ddl, new SqlTokenizer(), false);
List<Statement> statements = new LinkedList<Statement>();
token.start();
while (tokens.hasNext()) {
if (tokens.matches("SELECT")) {
statements.add(parseSelect(tokens));
} else {
statements.add(parseDelete(tokens));
}
}
return statements;
}
protected Select parseSelect( TokenStream tokens ) throws ParsingException {
tokens.consume("SELECT");
List<Column> columns = parseColumns(tokens);
tokens.consume("FROM");
String tableName = tokens.consume();
return new Select(tableName, columns);
}
protected List<Column> parseColumns( TokenStream tokens ) throws ParsingException {
List<Column> columns = new LinkedList<Column>();
if (tokens.matches('*')) {
tokens.consume(); // leave the columns empty to signal wildcard
} else {
// Read names until we see a ','
do {
String columnName = tokens.consume();
if (tokens.canConsume("AS")) {
String columnAlias = tokens.consume();
columns.add(new Column(columnName, columnAlias));
} else {
columns.add(new Column(columnName, null));
}
} while (tokens.canConsume(','));
}
return columns;
}
protected Delete parseDelete( TokenStream tokens ) throws ParsingException {
tokens.consume("DELETE", "FROM");
String tableName = tokens.consume();
tokens.consume("WHERE");
String lhs = tokens.consume();
tokens.consume('=');
String rhs = tokens.consume();
return new Delete(tableName, new Criteria(lhs, rhs));
}
}
public abstract class Statement { ... }
public class Query extends Statement { ... }
public class Delete extends Statement { ... }
public class Column { ... }
This example shows an idiomatic way of writing a parser that is stateless and thread-safe. The parse(...) method
takes the input as a parameter, and returns the domain-specific representation that resulted from the parsing. All other
methods are utility methods that simply encapsulate common logic or make the code more readable.
In the example, the parse(...) first creates a TokenStream object (using a Tokenizer implementation that is not
shown), and then loops as long as there are more tokens to read. As it loops, if the next token is "SELECT", the parser calls
the parseSelect(...) method which immediately consumes a "SELECT" token, the names of the columns separated by
commas (or a '*' if there all columns are to be selected), a "FROM" token, and the name of the table being queried. The
parseSelect(...) method returns a Select object, which then added to the list of statements in the
parse(...) method. The parser handles the "DELETE" statements in a similar manner.
Very often grammars to not require the case of keywords to match. This can make parsing a challenge, because all combinations of case need to be used. The TokenStream framework provides a very simple solution that requires no more effort than providing a boolean parameter to the constructor.
When a false value is provided for the the caseSensitive parameter, the TokenStream performs all
matching operations as if each token's value were in uppercase only. This means that the arguments supplied to the
match(...), canConsume(...), and consume(...) methods should be upper-cased. Note that
the actual value of each token remains the actual case as it appears in the input.
Of course, when the TokenStream is created with a true value for the caseSensitive parameter, the
matching is performed using the actual value as it appears in the input content
Many grammars are independent of lines breaks or whitespace, allowing a lot of flexibility when writing the content. The TokenStream framework makes it very easy to ignore line breaks and whitespace. To do so, the Tokenizer implementation must simply not include the line break character sequences and whitespace in the token ranges. Since none of the tokens contain whitespace, the parser never has to deal with them.
Of course, many parsers will require that some whitespace be included. For example, whitespace within a quoted string may be needed by the parser. In this case, the Tokenizer should simply include the whitespace characters in the tokens.
Each parser will likely have its own TokenStream.Tokenizer implementation that contains the parser-specific logic about how to
break the content into token objects. Generally, the easiest way to do this is to simply iterate through the character sequence
passed into the tokenize(...) method, and use a
switch statement to decide what to do.
Here is the code for a very basic Tokenizer implementation that ignores whitespace, line breaks and Java-style (multi-line and end-of-line) comments, while constructing single tokens for each quoted string.
public class BasicTokenizer implements Tokenizer {
public void tokenize(CharacterStream input,
Tokens tokens)
throws ParsingException {
while (input.hasNext()) {
char c = input.next();
switch (c) {
case ' ':
case '\t':
case '\n':
case '\r':
// Just skip these whitespace characters ...
break;
case '-':
case '(':
case ')':
case '{':
case '}':
case '*':
case ',':
case ';':
case '+':
case '%':
case '?':
case '$':
case '[':
case ']':
case '!':
case '<':
case '>':
case '|':
case '=':
case ':':
tokens.addToken(input.index(), input.index() + 1, SYMBOL);
break;
case '.':
tokens.addToken(input.index(), input.index() + 1, DECIMAL);
break;
case '\"':
int startIndex = input.index();
Position startingPosition = input.position();
boolean foundClosingQuote = false;
while (input.hasNext()) {
c = input.next();
if (c == '\\' && input.isNext('"')) {
c = input.next(); // consume the ' character since it is escaped
} else if (c == '"') {
foundClosingQuote = true;
break;
}
}
if (!foundClosingQuote) {
throw new ParsingException(startingPosition, "No matching closing double quote found");
}
int endIndex = input.index() + 1; // beyond last character read
tokens.addToken(startIndex, endIndex, DOUBLE_QUOTED_STRING);
break;
case '\'':
startIndex = input.index();
startingPosition = input.position();
foundClosingQuote = false;
while (input.hasNext()) {
c = input.next();
if (c == '\\' && input.isNext('\'')) {
c = input.next(); // consume the ' character since it is escaped
} else if (c == '\'') {
foundClosingQuote = true;
break;
}
}
if (!foundClosingQuote) {
throw new ParsingException(startingPosition, "No matching closing single quote found");
}
endIndex = input.index() + 1; // beyond last character read
tokens.addToken(startIndex, endIndex, SINGLE_QUOTED_STRING);
break;
case '/':
startIndex = input.index();
if (input.isNext('/')) {
// End-of-line comment ...
boolean foundLineTerminator = false;
while (input.hasNext()) {
c = input.next();
if (c == '\n' || c == '\r') {
foundLineTerminator = true;
break;
}
}
endIndex = input.index(); // the token won't include the '\n' or '\r' character(s)
if (!foundLineTerminator) ++endIndex; // must point beyond last char
if (c == '\r' && input.isNext('\n')) input.next();
if (useComments) {
tokens.addToken(startIndex, endIndex, COMMENT);
}
} else if (input.isNext('*')) {
// Multi-line comment ...
while (input.hasNext() && !input.isNext('*', '/')) {
c = input.next();
}
if (input.hasNext()) input.next(); // consume the '*'
if (input.hasNext()) input.next(); // consume the '/'
if (useComments) {
endIndex = input.index() + 1; // the token will include the '/' and '*' characters
tokens.addToken(startIndex, endIndex, COMMENT);
}
} else {
// just a regular slash ...
tokens.addToken(startIndex, startIndex + 1, SYMBOL);
}
break;
default:
startIndex = input.index();
// Read until another whitespace/symbol/decimal/slash is found
while (input.hasNext() && !(input.isNextWhitespace() || input.isNextAnyOf("/.-(){}*,;+%?$[]!<>|=:"))) {
c = input.next();
}
endIndex = input.index() + 1; // beyond last character that was included
tokens.addToken(startIndex, endIndex, WORD);
}
}
}
}
TokenStream.Tokenizers with exactly this behavior can actually be created using the basicTokenizer(boolean) method. So
while this very basic implementation is not meant to be used in all situations, it may be useful in some situations.
| Modifier and Type | Class and Description |
|---|---|
static class |
TokenStream.BasicTokenizer
A basic
TokenStream.Tokenizer implementation that ignores whitespace but includes tokens for individual symbols, the period
('.'), single-quoted strings, double-quoted strings, whitespace-delimited words, and optionally comments. |
protected class |
TokenStream.CaseInsensitiveToken |
class |
TokenStream.CaseInsensitiveTokenFactory |
protected class |
TokenStream.CaseSensitiveToken
An immutable
TokenStream.Token that implements matching using case-sensitive logic. |
class |
TokenStream.CaseSensitiveTokenFactory |
static class |
TokenStream.CharacterArrayStream
An implementation of
TokenStream.CharacterStream that works with a single character array. |
static interface |
TokenStream.CharacterStream
Interface used by a
TokenStream.Tokenizer to iterate through the characters in the content input to the TokenStream. |
static class |
TokenStream.Marker
An opaque marker for a position within the token stream.
|
static interface |
TokenStream.Token
The interface defining a token, which references the characters in the actual input character stream.
|
protected class |
TokenStream.TokenFactory |
static interface |
TokenStream.Tokenizer
Interface for a Tokenizer component responsible for processing the characters in a
TokenStream.CharacterStream and constructing
the appropriate TokenStream.Token objects. |
static interface |
TokenStream.Tokens
A factory for Token objects, used by a
TokenStream.Tokenizer to create tokens in the correct order. |
| Modifier and Type | Field and Description |
|---|---|
static int |
ANY_TYPE
A constant that can be used with the
matches(int), matches(int, int...), consume(int), and
canConsume(int) methods to signal that any token type is allowed to be matched. |
static String |
ANY_VALUE
A constant that can be used with the
matches(String), matches(String, String...),
consume(String), consume(String, String...), canConsume(String) and
canConsume(String, String...) methods to signal that any value is allowed to be matched. |
private boolean |
caseSensitive |
private boolean |
completed |
private TokenStream.Token |
currentToken |
private char[] |
inputContent |
protected String |
inputString |
private ListIterator<TokenStream.Token> |
tokenIterator
This class navigates the Token objects using this iterator.
|
private TokenStream.Tokenizer |
tokenizer |
private List<TokenStream.Token> |
tokens |
| Constructor and Description |
|---|
TokenStream(String content,
TokenStream.Tokenizer tokenizer,
boolean caseSensitive) |
| Modifier and Type | Method and Description |
|---|---|
boolean |
advance(TokenStream.Marker marker)
Advance the stream back to the position described by the supplied marker.
|
static TokenStream.BasicTokenizer |
basicTokenizer(boolean includeComments)
Obtain a basic
TokenStream.Tokenizer implementation that ignores whitespace but includes tokens for individual symbols, the
period ('.'), single-quoted strings, double-quoted strings, whitespace-delimited words, and optionally comments. |
boolean |
canConsume(char expected)
Attempt to consume this current token if it matches the expected value, and return whether this method was indeed able to
consume the token.
|
boolean |
canConsume(int expectedType)
Attempt to consume this current token if it matches the expected token type, and return whether this method was indeed able
to consume the token.
|
boolean |
canConsume(int type,
String expected)
Attempt to consume this current token if it matches the expected value, and return whether this method was indeed able to
consume the token.
|
boolean |
canConsume(int type,
String currentExpected,
String... expectedForNextTokens)
Attempt to consume this current token and the next tokens if and only if they match the expected type and values,
and return whether this method was indeed able to consume all of the supplied tokens.
|
boolean |
canConsume(Iterable<String> nextTokens)
Attempt to consume this current token and the next tokens if and only if they match the expected values, and return whether
this method was indeed able to consume all of the supplied tokens.
|
boolean |
canConsume(String expected)
Attempt to consume this current token if it matches the expected value, and return whether this method was indeed able to
consume the token.
|
boolean |
canConsume(String[] nextTokens)
Attempt to consume this current token and the next tokens if and only if they match the expected values, and return whether
this method was indeed able to consume all of the supplied tokens.
|
boolean |
canConsume(String currentExpected,
String... expectedForNextTokens)
Attempt to consume this current token and the next tokens if and only if they match the expected values, and return whether
this method was indeed able to consume all of the supplied tokens.
|
boolean |
canConsumeAnyOf(int[] typeOptions)
Attempt to consume the next token if it matches one of the supplied types.
|
boolean |
canConsumeAnyOf(int firstTypeOption,
int... additionalTypeOptions)
Attempt to consume the next token if it matches one of the supplied types.
|
boolean |
canConsumeAnyOf(Iterable<String> options)
Attempt to consume the next token if it matches one of the supplied values.
|
boolean |
canConsumeAnyOf(String[] options)
Attempt to consume the next token if it matches one of the supplied values.
|
boolean |
canConsumeAnyOf(String firstOption,
String... additionalOptions)
Attempt to consume the next token if it matches one of the supplied values.
|
boolean |
canConsumeBoolean(BooleanConsumer consumer)
Attempt to consume this current token if it can be parsed as a boolean, and return whether this method was indeed able to
consume the token.
|
boolean |
canConsumeInteger(IntConsumer consumer)
Attempt to consume this current token if it can be parsed as an integer, and return whether this method was indeed able to
consume the token.
|
boolean |
canConsumeLong(LongConsumer consumer)
Attempt to consume this current token if it can be parsed as a long, and return whether this method was indeed able to
consume the token.
|
boolean |
canConsumeWord(String expected)
Attempt to consume this current token if it is
TokenStream.BasicTokenizer.WORD and it matches the expected value,
and return whether this method was indeed able to consume the token. |
boolean |
canConsumeWords(String currentExpected,
String... expectedForNextTokens)
Attempt to consume this current token and the next tokens if and only if they are of
TokenStream.BasicTokenizer.WORD and match the expected values,
and return whether this method was indeed able to consume all of the supplied tokens. |
String |
consume()
Return the value of this token and move to the next token.
|
TokenStream |
consume(char expected)
Attempt to consume this current token as long as it matches the expected character, or throw an exception if the token does
not match.
|
TokenStream |
consume(int expectedType)
Attempt to consume this current token as long as it matches the expected character, or throw an exception if the token does
not match.
|
TokenStream |
consume(Iterable<String> nextTokens)
Attempt to consume this current token as the next tokens as long as they match the expected values, or throw an exception
if the token does not match.
|
TokenStream |
consume(String expected)
Attempt to consume this current token as long as it matches the expected value, or throw an exception if the token does not
match.
|
TokenStream |
consume(String[] nextTokens)
Attempt to consume this current token as the next tokens as long as they match the expected values, or throw an exception
if the token does not match.
|
TokenStream |
consume(String expected,
String... expectedForNextTokens)
Attempt to consume this current token as the next tokens as long as they match the expected values, or throw an exception
if the token does not match.
|
String |
consumeAnyOf(int... typeOptions)
Consume and return the next token that must match one of the supplied values.
|
String |
consumeAnyOf(String... options)
Consume and return the next token that must match one of the supplied values.
|
TokenStream |
consumeThrough(char expected)
Attempt to consume all tokens until the specified token is consumed, and then stop.
|
TokenStream |
consumeThrough(char expected,
char skipMatchingTokens)
Attempt to consume all tokens until the specified token is consumed, and then stop.
|
TokenStream |
consumeThrough(String expected)
Attempt to consume all tokens until the specified token is consumed, and then stop.
|
TokenStream |
consumeThrough(String expected,
String skipMatchingTokens)
Attempt to consume all tokens until the specified token is consumed, and then stop.
|
TokenStream |
consumeUntil(char expected)
Attempt to consume all tokens until the specified token is found, and then stop before consuming that token.
|
TokenStream |
consumeUntil(char expected,
char skipMatchingTokens)
Attempt to consume all tokens until the specified token is found, and then stop before consuming that token.
|
TokenStream |
consumeUntil(String expected)
Attempt to consume all tokens until the specified token is found, and then stop before consuming that token.
|
TokenStream |
consumeUntil(String expected,
String... skipMatchingTokens)
Attempt to consume all tokens until the specified token is found, and then stop before consuming that token.
|
TokenStream |
consumeUntilEndOrOneOf(String... stopTokens)
Consume the token stream until one of the stop tokens or the end of the stream is found.
|
(package private) TokenStream.Token |
currentToken()
Get the current token.
|
(package private) String |
generateFragment() |
(package private) static String |
generateFragment(String content,
int indexOfProblem,
int charactersToIncludeBeforeAndAfter,
String highlightText)
Utility method to generate a highlighted fragment of a particular point in the stream.
|
protected String |
getContentBetween(int startIndex,
Position end) |
String |
getContentBetween(Position starting,
Position end)
Gets the content string starting at the first position (inclusive) and continuing up to the end position (exclusive).
|
String |
getContentBetween(TokenStream.Marker starting,
Position end)
Gets the content string starting at the specified marker (inclusive) and continuing up to the end position (exclusive).
|
String |
getContentFrom(TokenStream.Marker starting)
Gets the content string starting at the specified marker (inclusive) and continuing up to the next position (exclusive).
|
String |
getInputString() |
boolean |
hasNext()
Determine if this stream has another token to be consumed.
|
protected List<TokenStream.Token> |
initializeTokens(List<TokenStream.Token> tokens)
Method to allow subclasses to pre-process the set of tokens and return the correct tokens to use.
|
TokenStream.Marker |
mark()
Obtain a marker that records the current position so that the stream can be
rewind(Marker) back to the mark even
after having been advanced beyond the mark. |
boolean |
matches(char expected)
Determine if the current token matches the expected value.
|
boolean |
matches(int expectedType)
Determine if the current token matches the expected token type.
|
boolean |
matches(int[] typesForNextTokens)
Determine if the next few tokens have the supplied types.
|
boolean |
matches(int currentExpectedType,
int... expectedTypeForNextTokens)
Determine if the next few tokens have the supplied types.
|
boolean |
matches(int type,
String expected)
Determine if the current token matches the expected type and a value.
|
boolean |
matches(Iterable<String> nextTokens)
Determine if the next few tokens match the expected values.
|
boolean |
matches(String expected)
Determine if the current token matches the expected value.
|
boolean |
matches(String[] nextTokens)
Determine if the next few tokens match the expected values.
|
boolean |
matches(String currentExpected,
String... expectedForNextTokens)
Determine if the next few tokens match the expected values.
|
boolean |
matchesAnyOf(int[] typeOptions)
Determine if the next token have one of the supplied types.
|
boolean |
matchesAnyOf(int firstTypeOption,
int... additionalTypeOptions)
Determine if the next token have one of the supplied types.
|
boolean |
matchesAnyOf(int type,
String firstOption,
String... additionalOptions)
Determine if the next token matches one of the supplied values of the expected type.
|
boolean |
matchesAnyOf(Iterable<String> options)
Determine if the next token matches one of the supplied values.
|
boolean |
matchesAnyOf(String[] options)
Determine if the next token matches one of the supplied values.
|
boolean |
matchesAnyOf(String firstOption,
String... additionalOptions)
Determine if the next token matches one of the supplied values.
|
boolean |
matchesAnyWordOf(String firstOption,
String... additionalOptions)
Determine if the next token matches one of the supplied values of the type
TokenStream.BasicTokenizer.WORD |
boolean |
matchesWord(String expected)
Determine if the current token is
TokenStream.BasicTokenizer.WORD and matches the expected value. |
private void |
moveToNextToken() |
private void |
moveToNextToken(List<TokenStream.Token> newTokens) |
Position |
nextPosition()
Get the position of the next (or current) token.
|
String |
peek() |
Position |
previousPosition()
Get the position of the previous token.
|
Position |
previousPosition(int count)
Get the position of a token earlier in the stream from the current position.
|
TokenStream.Token |
previousToken(int count)
Get the previous token.
|
void |
rewind()
Method to allow tokens to be re-used from the start without re-tokenizing content.
|
boolean |
rewind(TokenStream.Marker marker)
Reset the stream back to the position described by the supplied marker.
|
TokenStream |
start()
Begin the token stream, including (if required) the tokenization of the input content.
|
protected void |
throwNoMoreContent() |
String |
toString() |
public static final String ANY_VALUE
matches(String), matches(String, String...),
consume(String), consume(String, String...), canConsume(String) and
canConsume(String, String...) methods to signal that any value is allowed to be matched.
Note that this exact instance must be used; an equivalent string will not work.
public static final int ANY_TYPE
matches(int), matches(int, int...), consume(int), and
canConsume(int) methods to signal that any token type is allowed to be matched.protected final String inputString
private final char[] inputContent
private final boolean caseSensitive
private final TokenStream.Tokenizer tokenizer
private List<TokenStream.Token> tokens
private ListIterator<TokenStream.Token> tokenIterator
T1 T2 T3 T4 T5
ˆ ˆ ˆ
| | |
| | +- The position of the tokenIterator, where tokenIterator.hasNext() will return T3
| +---- The token referenced by currentToken
+-------- The logical position of the TokenStream object, where the "consume()" would return T2
private TokenStream.Token currentToken
private boolean completed
public TokenStream(String content, TokenStream.Tokenizer tokenizer, boolean caseSensitive)
public TokenStream start() throws ParsingException
ParsingException - if an error occurs during tokenization of the contentprotected List<TokenStream.Token> initializeTokens(List<TokenStream.Token> tokens)
tokens - the tokenspublic void rewind()
public TokenStream.Marker mark()
rewind(Marker) back to the mark even
after having been advanced beyond the mark.IllegalStateException - if this method was called before the stream was startedNoSuchElementException - if there are no more tokenspublic boolean rewind(TokenStream.Marker marker)
marker - the markeradvance(Marker)public boolean advance(TokenStream.Marker marker)
marker - the markerrewind(Marker)public Position previousPosition()
IllegalStateException - if this method was called before the stream was startedNoSuchElementException - if there is no previous tokenpublic Position previousPosition(int count)
count - the number of tokens before the current position (e.g., 1 for the previous position)IllegalStateException - if this method was called before the stream was startedNoSuchElementException - if there is no previous tokenpublic Position nextPosition()
IllegalStateException - if this method was called before the stream was startedNoSuchElementException - if there is no previous tokenpublic String consume() throws ParsingException, IllegalStateException
ParsingException - if there is no such token to consumeIllegalStateException - if this method was called before the stream was startedprotected void throwNoMoreContent()
throws ParsingException
ParsingExceptionpublic String peek() throws IllegalStateException
IllegalStateExceptionpublic TokenStream consume(String expected) throws ParsingException, IllegalStateException
The ANY_VALUE constant can be used in the expected values as a wildcard.
expected - the expected value of the current tokenParsingException - if the current token doesn't match the supplied valueIllegalStateException - if this method was called before the stream was startedpublic TokenStream consume(char expected) throws ParsingException, IllegalStateException
expected - the expected character of the current tokenParsingException - if the current token doesn't match the supplied valueIllegalStateException - if this method was called before the stream was startedpublic TokenStream consume(int expectedType) throws ParsingException, IllegalStateException
The ANY_TYPE constant can be used in the expected values as a wildcard.
expectedType - the expected token type of the current tokenParsingException - if the current token doesn't match the supplied valueIllegalStateException - if this method was called before the stream was startedpublic TokenStream consume(String expected, String... expectedForNextTokens) throws ParsingException, IllegalStateException
The ANY_VALUE constant can be used in the expected values as a wildcard.
expected - the expected value of the current tokenexpectedForNextTokens - the expected values of the following tokensParsingException - if the current token doesn't match the supplied valueIllegalStateException - if this method was called before the stream was startedpublic TokenStream consume(String[] nextTokens) throws ParsingException, IllegalStateException
The ANY_VALUE constant can be used in the expected values as a wildcard.
nextTokens - the expected values for the next tokensParsingException - if the current token doesn't match the supplied valueIllegalStateException - if this method was called before the stream was startedpublic TokenStream consume(Iterable<String> nextTokens) throws ParsingException, IllegalStateException
The ANY_VALUE constant can be used in the expected values as a wildcard.
nextTokens - the expected values for the next tokensParsingException - if the current token doesn't match the supplied valueIllegalStateException - if this method was called before the stream was startedpublic String consumeAnyOf(int... typeOptions) throws IllegalStateException
typeOptions - the options for the type of the current tokenParsingException - if the current token doesn't match the supplied valueIllegalStateException - if this method was called before the stream was startedpublic String consumeAnyOf(String... options) throws IllegalStateException
options - the additional options for the value of the current tokenParsingException - if the current token doesn't match the supplied valueIllegalStateException - if this method was called before the stream was startedpublic TokenStream consumeThrough(char expected) throws ParsingException, IllegalStateException
The ANY_VALUE constant can be used in the expected values as a wildcard.
expected - the token that is to be foundParsingException - if the specified token cannot be foundIllegalStateException - if this method was called before the stream was startedpublic TokenStream consumeThrough(char expected, char skipMatchingTokens) throws ParsingException, IllegalStateException
The ANY_VALUE constant can be used in the expected values as a wildcard.
expected - the token that is to be foundskipMatchingTokens - the token that, if found, should result in skipping expected once for each occurrence
of skipMatchingTokens; may be nullParsingException - if the specified token cannot be foundIllegalStateException - if this method was called before the stream was startedpublic TokenStream consumeThrough(String expected) throws ParsingException, IllegalStateException
The ANY_VALUE constant can be used in the expected values as a wildcard.
expected - the token that is to be foundParsingException - if the specified token cannot be foundIllegalStateException - if this method was called before the stream was startedpublic TokenStream consumeThrough(String expected, String skipMatchingTokens) throws ParsingException, IllegalStateException
The ANY_VALUE constant can be used in the expected values as a wildcard.
expected - the token that is to be foundskipMatchingTokens - the token that, if found, should result in skipping expected once for each occurrence
of skipMatchingTokens; may be nullParsingException - if the specified token cannot be foundIllegalStateException - if this method was called before the stream was startedpublic TokenStream consumeUntil(char expected) throws ParsingException, IllegalStateException
The ANY_VALUE constant can be used in the expected values as a wildcard.
expected - the token that is to be foundParsingException - if the specified token cannot be foundIllegalStateException - if this method was called before the stream was startedpublic TokenStream consumeUntil(char expected, char skipMatchingTokens) throws ParsingException, IllegalStateException
The ANY_VALUE constant can be used in the expected values as a wildcard.
expected - the token that is to be foundskipMatchingTokens - the token that, if found, should result in skipping expected once for each occurrence
of skipMatchingTokensParsingException - if the specified token cannot be foundIllegalStateException - if this method was called before the stream was startedpublic TokenStream consumeUntil(String expected) throws ParsingException, IllegalStateException
The ANY_VALUE constant can be used in the expected values as a wildcard.
expected - the token that is to be foundParsingException - if the specified token cannot be foundIllegalStateException - if this method was called before the stream was startedpublic TokenStream consumeUntil(String expected, String... skipMatchingTokens) throws ParsingException, IllegalStateException
The ANY_VALUE constant can be used in the expected values as a wildcard.
expected - the token that is to be foundskipMatchingTokens - the token that, if found, should result in skipping expected once for each occurrence
of skipMatchingTokens; may be nullParsingException - if the specified token cannot be foundIllegalStateException - if this method was called before the stream was startedpublic TokenStream consumeUntilEndOrOneOf(String... stopTokens) throws ParsingException, IllegalStateException
stopTokens - the stop tokens; may not be nullParsingException - if none of the specified tokens can be foundIllegalStateException - if this method was called before the stream was startedpublic boolean canConsumeInteger(IntConsumer consumer) throws IllegalStateException
consumer - the function that should be called with the integer value if the current token token could be parsedIllegalStateException - if this method was called before the stream was startedpublic boolean canConsumeBoolean(BooleanConsumer consumer) throws IllegalStateException
consumer - the function that should be called with the boolean value if the current token token could be parsedIllegalStateException - if this method was called before the stream was startedpublic boolean canConsumeLong(LongConsumer consumer) throws IllegalStateException
consumer - the function that should be called with the long value if the current token token could be parsedIllegalStateException - if this method was called before the stream was startedpublic boolean canConsume(String expected) throws IllegalStateException
The ANY_VALUE constant can be used in the expected value as a wildcard.
expected - the expected value of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean canConsume(int type,
String expected)
throws IllegalStateException
The ANY_VALUE constant can be used in the expected value as a wildcard.
type - the expected type of the current tokenexpected - the expected value of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean canConsumeWord(String expected) throws IllegalStateException
TokenStream.BasicTokenizer.WORD and it matches the expected value,
and return whether this method was indeed able to consume the token.
The ANY_VALUE constant can be used in the expected value as a wildcard.
expected - the expected value of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean canConsume(char expected)
throws IllegalStateException
expected - the expected value of the current token tokenIllegalStateException - if this method was called before the stream was startedpublic boolean canConsume(int expectedType)
throws IllegalStateException
The ANY_TYPE constant can be used in the expected type as a wildcard.
expectedType - the expected token type of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean canConsume(String currentExpected, String... expectedForNextTokens) throws IllegalStateException
This is not the same as calling canConsume(String) for each of the supplied arguments, since this method
ensures that all of the supplied values can be consumed.
This method is equivalent to calling the following:
if (tokens.matches(currentExpected, expectedForNextTokens)) {
tokens.consume(currentExpected, expectedForNextTokens);
}
The ANY_VALUE constant can be used in the expected values as a wildcard.
currentExpected - the expected value of the current tokenexpectedForNextTokens - the expected values fo the following tokensIllegalStateException - if this method was called before the stream was startedpublic boolean canConsume(int type,
String currentExpected,
String... expectedForNextTokens)
throws IllegalStateException
This is not the same as calling #canConsume(type String) for each of the supplied arguments, since this method
ensures that all of the supplied values can be consumed.
This method is equivalent to calling the following:
if (tokens.matches(currentExpected, expectedForNextTokens) && tokens.matches(type, type, ...)) {
tokens.consume(currentExpected, expectedForNextTokens);
}
The ANY_VALUE constant can be used in the expected values as a wildcard.
type - the expect type of the tokenscurrentExpected - the expected value of the current tokenexpectedForNextTokens - the expected values fo the following tokensIllegalStateException - if this method was called before the stream was startedpublic boolean canConsumeWords(String currentExpected, String... expectedForNextTokens) throws IllegalStateException
TokenStream.BasicTokenizer.WORD and match the expected values,
and return whether this method was indeed able to consume all of the supplied tokens.currentExpected - the expected value of the current tokenexpectedForNextTokens - the expected values fo the following tokensIllegalStateException - if this method was called before the stream was startedcanConsume(int, String, String...)public boolean canConsume(String[] nextTokens) throws IllegalStateException
This is not the same as calling canConsume(String) for each of the supplied arguments, since this method
ensures that all of the supplied values can be consumed.
This method is equivalent to calling the following:
if (tokens.matches(currentExpected, expectedForNextTokens)) {
tokens.consume(currentExpected, expectedForNextTokens);
}
The ANY_VALUE constant can be used in the expected values as a wildcard.
nextTokens - the expected values of the next tokensIllegalStateException - if this method was called before the stream was startedpublic boolean canConsume(Iterable<String> nextTokens) throws IllegalStateException
This is not the same as calling canConsume(String) for each of the supplied arguments, since this method
ensures that all of the supplied values can be consumed.
This method is equivalent to calling the following:
if (tokens.matches(currentExpected, expectedForNextTokens)) {
tokens.consume(currentExpected, expectedForNextTokens);
}
The ANY_VALUE constant can be used in the expected values as a wildcard.
nextTokens - the expected values of the next tokensIllegalStateException - if this method was called before the stream was startedpublic boolean canConsumeAnyOf(String firstOption, String... additionalOptions) throws IllegalStateException
firstOption - the first option for the value of the current tokenadditionalOptions - the additional options for the value of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean canConsumeAnyOf(String[] options) throws IllegalStateException
options - the options for the value of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean canConsumeAnyOf(Iterable<String> options) throws IllegalStateException
options - the options for the value of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean canConsumeAnyOf(int firstTypeOption,
int... additionalTypeOptions)
throws IllegalStateException
firstTypeOption - the first option for the type of the current tokenadditionalTypeOptions - the additional options for the type of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean canConsumeAnyOf(int[] typeOptions)
throws IllegalStateException
typeOptions - the options for the type of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean matches(String expected) throws IllegalStateException
The ANY_VALUE constant can be used as a wildcard.
expected - the expected value of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean matches(int type,
String expected)
throws IllegalStateException
The ANY_VALUE constant can be used as a wildcard.
type - the expected type of the curent tokenexpected - the expected value of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean matchesWord(String expected) throws IllegalStateException
TokenStream.BasicTokenizer.WORD and matches the expected value.
The ANY_VALUE constant can be used as a wildcard.
expected - the expected value of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean matches(char expected)
throws IllegalStateException
expected - the expected value of the current token tokenIllegalStateException - if this method was called before the stream was startedpublic boolean matches(int expectedType)
throws IllegalStateException
expectedType - the expected token type of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean matches(String currentExpected, String... expectedForNextTokens) throws IllegalStateException
The ANY_VALUE constant can be used in the expected values as a wildcard.
currentExpected - the expected value of the current tokenexpectedForNextTokens - the expected values for the following tokensIllegalStateException - if this method was called before the stream was startedpublic boolean matches(String[] nextTokens) throws IllegalStateException
The ANY_VALUE constant can be used in the expected values as a wildcard.
nextTokens - the expected value of the next tokensIllegalStateException - if this method was called before the stream was startedpublic boolean matches(Iterable<String> nextTokens) throws IllegalStateException
The ANY_VALUE constant can be used in the expected values as a wildcard.
nextTokens - the expected value of the next tokensIllegalStateException - if this method was called before the stream was startedpublic boolean matches(int currentExpectedType,
int... expectedTypeForNextTokens)
throws IllegalStateException
The ANY_TYPE constant can be used in the expected values as a wildcard.
currentExpectedType - the expected type of the current tokenexpectedTypeForNextTokens - the expected type for the following tokensIllegalStateException - if this method was called before the stream was startedpublic boolean matches(int[] typesForNextTokens)
throws IllegalStateException
The ANY_TYPE constant can be used in the expected values as a wildcard.
typesForNextTokens - the expected type for each of the next tokensIllegalStateException - if this method was called before the stream was startedpublic boolean matchesAnyOf(String firstOption, String... additionalOptions) throws IllegalStateException
firstOption - the first option for the value of the current tokenadditionalOptions - the additional options for the value of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean matchesAnyOf(int type,
String firstOption,
String... additionalOptions)
throws IllegalStateException
type - the expected type of tokensfirstOption - the first option for the value of the current tokenadditionalOptions - the additional options for the value of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean matchesAnyWordOf(String firstOption, String... additionalOptions) throws IllegalStateException
TokenStream.BasicTokenizer.WORDfirstOption - the first option for the value of the current tokenadditionalOptions - the additional options for the value of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean matchesAnyOf(String[] options) throws IllegalStateException
options - the options for the value of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean matchesAnyOf(Iterable<String> options) throws IllegalStateException
options - the options for the value of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean matchesAnyOf(int firstTypeOption,
int... additionalTypeOptions)
throws IllegalStateException
firstTypeOption - the first option for the type of the current tokenadditionalTypeOptions - the additional options for the type of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean matchesAnyOf(int[] typeOptions)
throws IllegalStateException
typeOptions - the options for the type of the current tokenIllegalStateException - if this method was called before the stream was startedpublic boolean hasNext()
IllegalStateException - if this method was called before the stream was startedprivate void moveToNextToken(List<TokenStream.Token> newTokens)
private void moveToNextToken()
final TokenStream.Token currentToken() throws IllegalStateException, NoSuchElementException
IllegalStateException - if this method was called before the stream was startedNoSuchElementException - if there are no more tokenspublic String getContentFrom(TokenStream.Marker starting)
starting - the marker describing a point in the stream; may not be nullpublic String getContentBetween(TokenStream.Marker starting, Position end)
starting - the marker describing a point in the stream; may not be nullend - the position located directly after the returned content string; can be null, which means end of contentpublic String getContentBetween(Position starting, Position end)
starting - the position marking the beginning of the desired content string; may not be nullend - the position located directly after the returned content string; can be null, which means end of contentpublic final TokenStream.Token previousToken(int count) throws IllegalStateException, NoSuchElementException
count - the number of tokens back from the current position that this method should returnIllegalStateException - if this method was called before the stream was startedNoSuchElementException - if there is no previous tokenString generateFragment()
static String generateFragment(String content, int indexOfProblem, int charactersToIncludeBeforeAndAfter, String highlightText)
content - the content from which the fragment should be taken; may not be nullindexOfProblem - the index of the problem point that should be highlighted; must be a valid index in the contentcharactersToIncludeBeforeAndAfter - the maximum number of characters before and after the problem point to include in
the fragmenthighlightText - the text that should be included in the fragment at the problem point to highlight the location, or an
empty string if there should be no highlightingpublic static TokenStream.BasicTokenizer basicTokenizer(boolean includeComments)
TokenStream.Tokenizer implementation that ignores whitespace but includes tokens for individual symbols, the
period ('.'), single-quoted strings, double-quoted strings, whitespace-delimited words, and optionally comments.
Note that the resulting Tokenizer may not be appropriate in many situations, but is provided merely as a convenience for those situations that happen to be able to use it.
includeComments - true if the comments should be retained and be included in the token stream, or false if comments
should be stripped and not included in the token streampublic String getInputString()
Copyright © 2021 JBoss by Red Hat. All rights reserved.