-
- All Implemented Interfaces:
-
org.readium.r2.shared.publication.services.content.ContentTokenizer,org.readium.r2.shared.util.tokenizer.Tokenizer
public final class TextContentTokenizer implements ContentTokenizer
A ContentTokenizer using a TextTokenizer to split the text of the Content.Element into smaller portions.
-
-
Constructor Summary
Constructors Constructor Description TextContentTokenizer(Language language, TextUnit unit, Boolean overrideContentLanguage)A ContentTokenizer using the default TextTokenizer to split the text of the Content.Element. TextContentTokenizer(Language language, Boolean overrideContentLanguage, Integer contextSnippetLength, Function1<Language, Tokenizer<String, IntRange>> textTokenizerFactory)
-
Method Summary
Modifier and Type Method Description List<Content.Element>tokenize(Content.Element data)-
-
Constructor Detail
-
TextContentTokenizer
TextContentTokenizer(Language language, TextUnit unit, Boolean overrideContentLanguage)
A ContentTokenizer using the default TextTokenizer to split the text of the Content.Element.
-
TextContentTokenizer
TextContentTokenizer(Language language, Boolean overrideContentLanguage, Integer contextSnippetLength, Function1<Language, Tokenizer<String, IntRange>> textTokenizerFactory)
- Parameters:
overrideContentLanguage- If true, let language override language information that could be available in content.contextSnippetLength- Length ofbeforeandaftersnippets in the produced Locators.
-
-
Method Detail
-
tokenize
List<Content.Element> tokenize(Content.Element data)
-
-
-
-