Class OperationExtractionWords


  • public class OperationExtractionWords
    extends Object
    Extract all the words from the PDF document, with page and position information. Generates an ASCII text, XML, or JSON file that will be returned as a result when the web service is called. For each found word, the file will contain the page number and the X-axis and Y-axis coordinates of the word. When the TEXT output format is selected, only the word's text will be output, separated with line breaks.
    • Constructor Detail

      • OperationExtractionWords

        public OperationExtractionWords()
    • Method Detail

      • getDelimitAfterPunctuation

        @Nullable
        public @Nullable Boolean getDelimitAfterPunctuation()
        If this attribute is set to true, a new word will be started after each punctuation mark.
        Returns:
        delimitAfterPunctuation
      • setDelimitAfterPunctuation

        public void setDelimitAfterPunctuation​(Boolean delimitAfterPunctuation)
      • getExtendedSequenceCharacters

        @Nullable
        public @Nullable Boolean getExtendedSequenceCharacters()
        This attribute specifies whether quotation marks and apostrophes should be handled the same way as brackets (such as parentheses and square brackets), i.e., whether they should be placed before the word they enclose.
        Returns:
        extendedSequenceCharacters
      • setExtendedSequenceCharacters

        public void setExtendedSequenceCharacters​(Boolean extendedSequenceCharacters)
      • getFileFormat

        @Nullable
        public @Nullable OperationExtractionWords.FileFormatEnum getFileFormat()
        Used to define the output format for the PDF document text contents being extracted. * text = Text document * xml = XML document * json = JSON data structure
        Returns:
        fileFormat
      • getPages

        @Nullable
        public @Nullable String getPages()
        Used to define which page(s) should be used for the extraction mode. The page number can be either an individual page, a page range, or a list (separated with commas) (e.g., \"1,5-6,9\"). A blank value or \"\\*\" selects all pages of the PDF document.
        Returns:
        pages
      • setPages

        public void setPages​(String pages)
      • getRemovePunctuation

        @Nullable
        public @Nullable Boolean getRemovePunctuation()
        Used to specify whether punctuation marks should be included in the export or whether they should be explicitly removed.
        Returns:
        removePunctuation
      • setRemovePunctuation

        public void setRemovePunctuation​(Boolean removePunctuation)
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object