Class ExtractionWordsType


  • public class ExtractionWordsType
    extends BaseExtractionType
     <?xml version="1.0" encoding="UTF-8"?><p xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">Extract all the words from the PDF document, with page and position information.</p>
     
     <?xml version="1.0" encoding="UTF-8"?><p xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">Generates an ASCII text, XML, or JSON file that will be returned as a result when the web service is called. For each found word, the file will contain the page number and the X-axis and Y-axis coordinates of the word. When the TEXT output format is selected, only the word's text will be output, separated with line breaks.</p>
     

    Java class for ExtractionWordsType complex type

    .

    The following schema fragment specifies the expected content contained within this class.

    
     <complexType name="ExtractionWordsType">
       <complexContent>
         <extension base="{http://schema.webpdf.de/1.0/operation}BaseExtractionType">
           <attribute name="removePunctuation" type="{http://www.w3.org/2001/XMLSchema}boolean" default="false" />
           <attribute name="delimitAfterPunctuation" type="{http://www.w3.org/2001/XMLSchema}boolean" default="true" />
           <attribute name="extendedSequenceCharacters" type="{http://www.w3.org/2001/XMLSchema}boolean" default="false" />
         </extension>
       </complexContent>
     </complexType>
     
    • Field Detail

      • removePunctuation

        protected Boolean removePunctuation
         <?xml version="1.0" encoding="UTF-8"?><p xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">Used to specify whether punctuation marks should be included in the export or whether they should be explicitly removed.</p>
         
      • delimitAfterPunctuation

        protected Boolean delimitAfterPunctuation
         <?xml version="1.0" encoding="UTF-8"?><p xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">If this attribute is set to true, a new word will be started after each punctuation mark.</p>
         
      • extendedSequenceCharacters

        protected Boolean extendedSequenceCharacters
         <?xml version="1.0" encoding="UTF-8"?><p xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">This attribute specifies whether quotation marks and apostrophes should be handled the same way as brackets (such as parentheses and square brackets), i.e., whether they should be placed before the word they enclose.</p>
         
    • Constructor Detail

      • ExtractionWordsType

        public ExtractionWordsType()
    • Method Detail

      • isRemovePunctuation

        public boolean isRemovePunctuation()
         <?xml version="1.0" encoding="UTF-8"?><p xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">Used to specify whether punctuation marks should be included in the export or whether they should be explicitly removed.</p>
         
        Returns:
        possible object is Boolean
      • setRemovePunctuation

        public void setRemovePunctuation​(boolean value)
        Sets the value of the removePunctuation property.
        Parameters:
        value - allowed object is Boolean
        See Also:
        isRemovePunctuation()
      • isSetRemovePunctuation

        public boolean isSetRemovePunctuation()
      • unsetRemovePunctuation

        public void unsetRemovePunctuation()
      • isDelimitAfterPunctuation

        public boolean isDelimitAfterPunctuation()
         <?xml version="1.0" encoding="UTF-8"?><p xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">If this attribute is set to true, a new word will be started after each punctuation mark.</p>
         
        Returns:
        possible object is Boolean
      • setDelimitAfterPunctuation

        public void setDelimitAfterPunctuation​(boolean value)
        Sets the value of the delimitAfterPunctuation property.
        Parameters:
        value - allowed object is Boolean
        See Also:
        isDelimitAfterPunctuation()
      • isSetDelimitAfterPunctuation

        public boolean isSetDelimitAfterPunctuation()
      • unsetDelimitAfterPunctuation

        public void unsetDelimitAfterPunctuation()
      • isExtendedSequenceCharacters

        public boolean isExtendedSequenceCharacters()
         <?xml version="1.0" encoding="UTF-8"?><p xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">This attribute specifies whether quotation marks and apostrophes should be handled the same way as brackets (such as parentheses and square brackets), i.e., whether they should be placed before the word they enclose.</p>
         
        Returns:
        possible object is Boolean
      • setExtendedSequenceCharacters

        public void setExtendedSequenceCharacters​(boolean value)
        Sets the value of the extendedSequenceCharacters property.
        Parameters:
        value - allowed object is Boolean
        See Also:
        isExtendedSequenceCharacters()
      • isSetExtendedSequenceCharacters

        public boolean isSetExtendedSequenceCharacters()
      • unsetExtendedSequenceCharacters

        public void unsetExtendedSequenceCharacters()