Package net.webpdf.wsclient.openapi
Class OperationExtractionWords
- java.lang.Object
-
- net.webpdf.wsclient.openapi.OperationExtractionWords
-
public class OperationExtractionWords extends Object
Extract all the words from the PDF document, with page and position information. Generates an ASCII text, XML, or JSON file that will be returned as a result when the web service is called. For each found word, the file will contain the page number and the X-axis and Y-axis coordinates of the word. When the TEXT output format is selected, only the word's text will be output, separated with line breaks.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classOperationExtractionWords.FileFormatEnumUsed to define the output format for the PDF document text contents being extracted
-
Field Summary
Fields Modifier and Type Field Description static StringJSON_PROPERTY_DELIMIT_AFTER_PUNCTUATIONstatic StringJSON_PROPERTY_EXTENDED_SEQUENCE_CHARACTERSstatic StringJSON_PROPERTY_FILE_FORMATstatic StringJSON_PROPERTY_PAGESstatic StringJSON_PROPERTY_REMOVE_PUNCTUATION
-
Constructor Summary
Constructors Constructor Description OperationExtractionWords()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description OperationExtractionWordsdelimitAfterPunctuation(Boolean delimitAfterPunctuation)booleanequals(Object o)OperationExtractionWordsextendedSequenceCharacters(Boolean extendedSequenceCharacters)OperationExtractionWordsfileFormat(OperationExtractionWords.FileFormatEnum fileFormat)@Nullable BooleangetDelimitAfterPunctuation()If this attribute is set to true, a new word will be started after each punctuation mark.@Nullable BooleangetExtendedSequenceCharacters()This attribute specifies whether quotation marks and apostrophes should be handled the same way as brackets (such as parentheses and square brackets), i.e., whether they should be placed before the word they enclose.@Nullable OperationExtractionWords.FileFormatEnumgetFileFormat()Used to define the output format for the PDF document text contents being extracted@Nullable StringgetPages()Used to define which page(s) should be used for the extraction mode.@Nullable BooleangetRemovePunctuation()Used to specify whether punctuation marks should be included in the export or whether they should be explicitly removed.inthashCode()OperationExtractionWordspages(String pages)OperationExtractionWordsremovePunctuation(Boolean removePunctuation)voidsetDelimitAfterPunctuation(Boolean delimitAfterPunctuation)voidsetExtendedSequenceCharacters(Boolean extendedSequenceCharacters)voidsetFileFormat(OperationExtractionWords.FileFormatEnum fileFormat)voidsetPages(String pages)voidsetRemovePunctuation(Boolean removePunctuation)StringtoString()
-
-
-
Field Detail
-
JSON_PROPERTY_DELIMIT_AFTER_PUNCTUATION
public static final String JSON_PROPERTY_DELIMIT_AFTER_PUNCTUATION
- See Also:
- Constant Field Values
-
JSON_PROPERTY_EXTENDED_SEQUENCE_CHARACTERS
public static final String JSON_PROPERTY_EXTENDED_SEQUENCE_CHARACTERS
- See Also:
- Constant Field Values
-
JSON_PROPERTY_FILE_FORMAT
public static final String JSON_PROPERTY_FILE_FORMAT
- See Also:
- Constant Field Values
-
JSON_PROPERTY_PAGES
public static final String JSON_PROPERTY_PAGES
- See Also:
- Constant Field Values
-
JSON_PROPERTY_REMOVE_PUNCTUATION
public static final String JSON_PROPERTY_REMOVE_PUNCTUATION
- See Also:
- Constant Field Values
-
-
Method Detail
-
delimitAfterPunctuation
public OperationExtractionWords delimitAfterPunctuation(Boolean delimitAfterPunctuation)
-
getDelimitAfterPunctuation
@Nullable public @Nullable Boolean getDelimitAfterPunctuation()
If this attribute is set to true, a new word will be started after each punctuation mark.- Returns:
- delimitAfterPunctuation
-
setDelimitAfterPunctuation
public void setDelimitAfterPunctuation(Boolean delimitAfterPunctuation)
-
extendedSequenceCharacters
public OperationExtractionWords extendedSequenceCharacters(Boolean extendedSequenceCharacters)
-
getExtendedSequenceCharacters
@Nullable public @Nullable Boolean getExtendedSequenceCharacters()
This attribute specifies whether quotation marks and apostrophes should be handled the same way as brackets (such as parentheses and square brackets), i.e., whether they should be placed before the word they enclose.- Returns:
- extendedSequenceCharacters
-
setExtendedSequenceCharacters
public void setExtendedSequenceCharacters(Boolean extendedSequenceCharacters)
-
fileFormat
public OperationExtractionWords fileFormat(OperationExtractionWords.FileFormatEnum fileFormat)
-
getFileFormat
@Nullable public @Nullable OperationExtractionWords.FileFormatEnum getFileFormat()
Used to define the output format for the PDF document text contents being extracted. * text = Text document * xml = XML document * json = JSON data structure- Returns:
- fileFormat
-
setFileFormat
public void setFileFormat(OperationExtractionWords.FileFormatEnum fileFormat)
-
pages
public OperationExtractionWords pages(String pages)
-
getPages
@Nullable public @Nullable String getPages()
Used to define which page(s) should be used for the extraction mode. The page number can be either an individual page, a page range, or a list (separated with commas) (e.g., \"1,5-6,9\"). A blank value or \"\\*\" selects all pages of the PDF document.- Returns:
- pages
-
setPages
public void setPages(String pages)
-
removePunctuation
public OperationExtractionWords removePunctuation(Boolean removePunctuation)
-
getRemovePunctuation
@Nullable public @Nullable Boolean getRemovePunctuation()
Used to specify whether punctuation marks should be included in the export or whether they should be explicitly removed.- Returns:
- removePunctuation
-
setRemovePunctuation
public void setRemovePunctuation(Boolean removePunctuation)
-
-