Class OperationOcr


  • public class OperationOcr
    extends Object
    The \"OCR\" web service can be used to run character recognition in PDF documents or images. If recognition is run on images, they will be converted to PDF documents. More specifically, a page will be generated for each image in the PDF document, with this page containing the original image and a text layer with the recognized text. Character recognition on PDF documents will only work with documents that do not contain text already. Normally, these will be documents that were generated by scanners and that only have an image per page in the PDF document.
    • Constructor Detail

      • OperationOcr

        public OperationOcr()
    • Method Detail

      • getCheckResolution

        @Nullable
        public @Nullable Boolean getCheckResolution()
        If \"true,\" then the DPI resolution of the output file will be checked. Resolutions of less than 200 DPI are rejected in this check because as a rule, they do not produce good results for character recognition.
        Returns:
        checkResolution
      • setCheckResolution

        public void setCheckResolution​(Boolean checkResolution)
      • getFailOnWarning

        @Nullable
        public @Nullable Boolean getFailOnWarning()
        If \"true\", character recognition will fail even in the event of warnings that do not prevent recognition, but that make it very unlikely for a meaningful result to be generated.
        Returns:
        failOnWarning
      • setFailOnWarning

        public void setFailOnWarning​(Boolean failOnWarning)
      • getForceEachPage

        @Nullable
        public @Nullable Boolean getForceEachPage()
        If a PDF document contains text content on any page, the web service will refuse to run character recognition again. If, however, a value of \"true\" is passed for this option, all the pages in the document will be considered individually and character recognition will be run on all pages that do not contain text (layers) so that a new layer with text will be generated for them.
        Returns:
        forceEachPage
      • setForceEachPage

        public void setForceEachPage​(Boolean forceEachPage)
      • getImageDpi

        @Nullable
        public @Nullable Integer getImageDpi()
        Used to set the minimum resolution images will be embedded with in resulting PDF documents. When a value of 0 is set for this parameter, the images shall be embedded using resolutions and dimensions as close as possible to the original source images. minimum: 0 maximum: 9600
        Returns:
        imageDpi
      • setImageDpi

        public void setImageDpi​(Integer imageDpi)
      • getJpegQuality

        @Nullable
        public @Nullable Integer getJpegQuality()
        A percentage that sets the compression ratio and influences the quality of JPEG images, that shall be embedded in resulting PDF documents. Higher values will result in less compressed images of higher quality. minimum: 0 maximum: 100
        Returns:
        jpegQuality
      • setJpegQuality

        public void setJpegQuality​(Integer jpegQuality)
      • getLanguage

        @Nullable
        public @Nullable OperationOcr.LanguageEnum getLanguage()
        Used to specify the language for the output document (PDF/image). The language must be defined for the character recognition operation (OCR) so that the \"special characters\" of the respective language (e.g. \"üäö\" in German) can be recognized better. At present, the following languages are supported: * eng = English * fra = French * spa = Spanish * deu = German * ita = Italian
        Returns:
        language
      • normalizePageRotation

        public OperationOcr normalizePageRotation​(Boolean normalizePageRotation)
      • getNormalizePageRotation

        @Nullable
        public @Nullable Boolean getNormalizePageRotation()
        If \"true\", then, for the recognition of a rotated text, the system will attempt to rotate the page in such a way that the text in the document will not appear to be rotated and will be shown \"upright.\"
        Returns:
        normalizePageRotation
      • setNormalizePageRotation

        public void setNormalizePageRotation​(Boolean normalizePageRotation)
      • getOcrMode

        @Nullable
        public @Nullable OperationOcr.OcrModeEnum getOcrMode()
        Specifies the mode used to find structured text on the pages. Depending on which mode is chosen, different requirements are set for the text and different assumptions are made about the text. * pageSegments = The text on the page is clearly structured and decomposable into clear paragraphs and layout segments. Overlapping of text elements/lines does not occur. Headings and thus texts with deviating text sizes and font set, could be present. * column = The text is arranged on the pages in several, more or less uniform columns, next to each other. Font and text size are mostly uniform. * unfiltered = No assumptions are made about the text, any letters that can be found are recognized as such, regardless of whether they can be assigned to a text column, or line, or even a word. Font size and typeface can vary absolutely and texts are not necessarily arranged in clearly recognizable columns or according to a fixed layout. Texts and lines can overlap. (This mode usually recognizes more text (especially with more complex layouts), but usually also generates the most error detections, since no result is sorted out due to its deviation from the norm.
        Returns:
        ocrMode
      • getOptimization

        @Nullable
        public @Nullable OperationImageOptimization getOptimization()
        Get optimization
        Returns:
        optimization
      • getOutputFormat

        @Nullable
        public @Nullable OperationOcr.OutputFormatEnum getOutputFormat()
        Different output formats can be created during character recognition. Generally, the document is generated as a PDF document, but the output can also be as an ASCII document or an XML document if desired (HOCR). * text = Text * hocr = XML (hOCR) * pdf = PDF
        Returns:
        outputFormat
      • getPage

        @Nullable
        public @Nullable OperationOcrPage getPage()
        Get page
        Returns:
        page
      • getPdfa

        @Nullable
        public @Nullable OperationPdfa getPdfa()
        Get pdfa
        Returns:
        pdfa
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object