Package net.webpdf.wsclient.openapi
Class OperationOcr
- java.lang.Object
-
- net.webpdf.wsclient.openapi.OperationOcr
-
public class OperationOcr extends Object
The \"OCR\" web service can be used to run character recognition in PDF documents or images. If recognition is run on images, they will be converted to PDF documents. More specifically, a page will be generated for each image in the PDF document, with this page containing the original image and a text layer with the recognized text. Character recognition on PDF documents will only work with documents that do not contain text already. Normally, these will be documents that were generated by scanners and that only have an image per page in the PDF document.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classOperationOcr.LanguageEnumUsed to specify the language for the output document (PDF/image).static classOperationOcr.OcrModeEnumSpecifies the mode used to find structured text on the pages.static classOperationOcr.OutputFormatEnumDifferent output formats can be created during character recognition.
-
Field Summary
Fields Modifier and Type Field Description static StringJSON_PROPERTY_CHECK_RESOLUTIONstatic StringJSON_PROPERTY_FAIL_ON_WARNINGstatic StringJSON_PROPERTY_FORCE_EACH_PAGEstatic StringJSON_PROPERTY_IMAGE_DPIstatic StringJSON_PROPERTY_JPEG_QUALITYstatic StringJSON_PROPERTY_LANGUAGEstatic StringJSON_PROPERTY_NORMALIZE_PAGE_ROTATIONstatic StringJSON_PROPERTY_OCR_MODEstatic StringJSON_PROPERTY_OPTIMIZATIONstatic StringJSON_PROPERTY_OUTPUT_FORMATstatic StringJSON_PROPERTY_PAGEstatic StringJSON_PROPERTY_PDFA
-
Constructor Summary
Constructors Constructor Description OperationOcr()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description OperationOcrcheckResolution(Boolean checkResolution)booleanequals(Object o)OperationOcrfailOnWarning(Boolean failOnWarning)OperationOcrforceEachPage(Boolean forceEachPage)@Nullable BooleangetCheckResolution()If \"true,\" then the DPI resolution of the output file will be checked.@Nullable BooleangetFailOnWarning()If \"true\", character recognition will fail even in the event of warnings that do not prevent recognition, but that make it very unlikely for a meaningful result to be generated.@Nullable BooleangetForceEachPage()If a PDF document contains text content on any page, the web service will refuse to run character recognition again.@Nullable IntegergetImageDpi()Used to set the minimum resolution images will be embedded with in resulting PDF documents.@Nullable IntegergetJpegQuality()A percentage that sets the compression ratio and influences the quality of JPEG images, that shall be embedded in resulting PDF documents.@Nullable OperationOcr.LanguageEnumgetLanguage()Used to specify the language for the output document (PDF/image).@Nullable BooleangetNormalizePageRotation()If \"true\", then, for the recognition of a rotated text, the system will attempt to rotate the page in such a way that the text in the document will not appear to be rotated and will be shown \"upright@Nullable OperationOcr.OcrModeEnumgetOcrMode()Specifies the mode used to find structured text on the pages.@Nullable OperationImageOptimizationgetOptimization()Get optimization@Nullable OperationOcr.OutputFormatEnumgetOutputFormat()Different output formats can be created during character recognition.@Nullable OperationOcrPagegetPage()Get page@Nullable OperationPdfagetPdfa()Get pdfainthashCode()OperationOcrimageDpi(Integer imageDpi)OperationOcrjpegQuality(Integer jpegQuality)OperationOcrlanguage(OperationOcr.LanguageEnum language)OperationOcrnormalizePageRotation(Boolean normalizePageRotation)OperationOcrocrMode(OperationOcr.OcrModeEnum ocrMode)OperationOcroptimization(OperationImageOptimization optimization)OperationOcroutputFormat(OperationOcr.OutputFormatEnum outputFormat)OperationOcrpage(OperationOcrPage page)OperationOcrpdfa(OperationPdfa pdfa)voidsetCheckResolution(Boolean checkResolution)voidsetFailOnWarning(Boolean failOnWarning)voidsetForceEachPage(Boolean forceEachPage)voidsetImageDpi(Integer imageDpi)voidsetJpegQuality(Integer jpegQuality)voidsetLanguage(OperationOcr.LanguageEnum language)voidsetNormalizePageRotation(Boolean normalizePageRotation)voidsetOcrMode(OperationOcr.OcrModeEnum ocrMode)voidsetOptimization(OperationImageOptimization optimization)voidsetOutputFormat(OperationOcr.OutputFormatEnum outputFormat)voidsetPage(OperationOcrPage page)voidsetPdfa(OperationPdfa pdfa)StringtoString()
-
-
-
Field Detail
-
JSON_PROPERTY_CHECK_RESOLUTION
public static final String JSON_PROPERTY_CHECK_RESOLUTION
- See Also:
- Constant Field Values
-
JSON_PROPERTY_FAIL_ON_WARNING
public static final String JSON_PROPERTY_FAIL_ON_WARNING
- See Also:
- Constant Field Values
-
JSON_PROPERTY_FORCE_EACH_PAGE
public static final String JSON_PROPERTY_FORCE_EACH_PAGE
- See Also:
- Constant Field Values
-
JSON_PROPERTY_IMAGE_DPI
public static final String JSON_PROPERTY_IMAGE_DPI
- See Also:
- Constant Field Values
-
JSON_PROPERTY_JPEG_QUALITY
public static final String JSON_PROPERTY_JPEG_QUALITY
- See Also:
- Constant Field Values
-
JSON_PROPERTY_LANGUAGE
public static final String JSON_PROPERTY_LANGUAGE
- See Also:
- Constant Field Values
-
JSON_PROPERTY_NORMALIZE_PAGE_ROTATION
public static final String JSON_PROPERTY_NORMALIZE_PAGE_ROTATION
- See Also:
- Constant Field Values
-
JSON_PROPERTY_OCR_MODE
public static final String JSON_PROPERTY_OCR_MODE
- See Also:
- Constant Field Values
-
JSON_PROPERTY_OPTIMIZATION
public static final String JSON_PROPERTY_OPTIMIZATION
- See Also:
- Constant Field Values
-
JSON_PROPERTY_OUTPUT_FORMAT
public static final String JSON_PROPERTY_OUTPUT_FORMAT
- See Also:
- Constant Field Values
-
JSON_PROPERTY_PAGE
public static final String JSON_PROPERTY_PAGE
- See Also:
- Constant Field Values
-
JSON_PROPERTY_PDFA
public static final String JSON_PROPERTY_PDFA
- See Also:
- Constant Field Values
-
-
Method Detail
-
checkResolution
public OperationOcr checkResolution(Boolean checkResolution)
-
getCheckResolution
@Nullable public @Nullable Boolean getCheckResolution()
If \"true,\" then the DPI resolution of the output file will be checked. Resolutions of less than 200 DPI are rejected in this check because as a rule, they do not produce good results for character recognition.- Returns:
- checkResolution
-
setCheckResolution
public void setCheckResolution(Boolean checkResolution)
-
failOnWarning
public OperationOcr failOnWarning(Boolean failOnWarning)
-
getFailOnWarning
@Nullable public @Nullable Boolean getFailOnWarning()
If \"true\", character recognition will fail even in the event of warnings that do not prevent recognition, but that make it very unlikely for a meaningful result to be generated.- Returns:
- failOnWarning
-
setFailOnWarning
public void setFailOnWarning(Boolean failOnWarning)
-
forceEachPage
public OperationOcr forceEachPage(Boolean forceEachPage)
-
getForceEachPage
@Nullable public @Nullable Boolean getForceEachPage()
If a PDF document contains text content on any page, the web service will refuse to run character recognition again. If, however, a value of \"true\" is passed for this option, all the pages in the document will be considered individually and character recognition will be run on all pages that do not contain text (layers) so that a new layer with text will be generated for them.- Returns:
- forceEachPage
-
setForceEachPage
public void setForceEachPage(Boolean forceEachPage)
-
imageDpi
public OperationOcr imageDpi(Integer imageDpi)
-
getImageDpi
@Nullable public @Nullable Integer getImageDpi()
Used to set the minimum resolution images will be embedded with in resulting PDF documents. When a value of 0 is set for this parameter, the images shall be embedded using resolutions and dimensions as close as possible to the original source images. minimum: 0 maximum: 9600- Returns:
- imageDpi
-
setImageDpi
public void setImageDpi(Integer imageDpi)
-
jpegQuality
public OperationOcr jpegQuality(Integer jpegQuality)
-
getJpegQuality
@Nullable public @Nullable Integer getJpegQuality()
A percentage that sets the compression ratio and influences the quality of JPEG images, that shall be embedded in resulting PDF documents. Higher values will result in less compressed images of higher quality. minimum: 0 maximum: 100- Returns:
- jpegQuality
-
setJpegQuality
public void setJpegQuality(Integer jpegQuality)
-
language
public OperationOcr language(OperationOcr.LanguageEnum language)
-
getLanguage
@Nullable public @Nullable OperationOcr.LanguageEnum getLanguage()
Used to specify the language for the output document (PDF/image). The language must be defined for the character recognition operation (OCR) so that the \"special characters\" of the respective language (e.g. \"üäö\" in German) can be recognized better. At present, the following languages are supported: * eng = English * fra = French * spa = Spanish * deu = German * ita = Italian- Returns:
- language
-
setLanguage
public void setLanguage(OperationOcr.LanguageEnum language)
-
normalizePageRotation
public OperationOcr normalizePageRotation(Boolean normalizePageRotation)
-
getNormalizePageRotation
@Nullable public @Nullable Boolean getNormalizePageRotation()
If \"true\", then, for the recognition of a rotated text, the system will attempt to rotate the page in such a way that the text in the document will not appear to be rotated and will be shown \"upright.\"- Returns:
- normalizePageRotation
-
setNormalizePageRotation
public void setNormalizePageRotation(Boolean normalizePageRotation)
-
ocrMode
public OperationOcr ocrMode(OperationOcr.OcrModeEnum ocrMode)
-
getOcrMode
@Nullable public @Nullable OperationOcr.OcrModeEnum getOcrMode()
Specifies the mode used to find structured text on the pages. Depending on which mode is chosen, different requirements are set for the text and different assumptions are made about the text. * pageSegments = The text on the page is clearly structured and decomposable into clear paragraphs and layout segments. Overlapping of text elements/lines does not occur. Headings and thus texts with deviating text sizes and font set, could be present. * column = The text is arranged on the pages in several, more or less uniform columns, next to each other. Font and text size are mostly uniform. * unfiltered = No assumptions are made about the text, any letters that can be found are recognized as such, regardless of whether they can be assigned to a text column, or line, or even a word. Font size and typeface can vary absolutely and texts are not necessarily arranged in clearly recognizable columns or according to a fixed layout. Texts and lines can overlap. (This mode usually recognizes more text (especially with more complex layouts), but usually also generates the most error detections, since no result is sorted out due to its deviation from the norm.- Returns:
- ocrMode
-
setOcrMode
public void setOcrMode(OperationOcr.OcrModeEnum ocrMode)
-
optimization
public OperationOcr optimization(OperationImageOptimization optimization)
-
getOptimization
@Nullable public @Nullable OperationImageOptimization getOptimization()
Get optimization- Returns:
- optimization
-
setOptimization
public void setOptimization(OperationImageOptimization optimization)
-
outputFormat
public OperationOcr outputFormat(OperationOcr.OutputFormatEnum outputFormat)
-
getOutputFormat
@Nullable public @Nullable OperationOcr.OutputFormatEnum getOutputFormat()
Different output formats can be created during character recognition. Generally, the document is generated as a PDF document, but the output can also be as an ASCII document or an XML document if desired (HOCR). * text = Text * hocr = XML (hOCR) * pdf = PDF- Returns:
- outputFormat
-
setOutputFormat
public void setOutputFormat(OperationOcr.OutputFormatEnum outputFormat)
-
page
public OperationOcr page(OperationOcrPage page)
-
getPage
@Nullable public @Nullable OperationOcrPage getPage()
Get page- Returns:
- page
-
setPage
public void setPage(OperationOcrPage page)
-
pdfa
public OperationOcr pdfa(OperationPdfa pdfa)
-
getPdfa
@Nullable public @Nullable OperationPdfa getPdfa()
Get pdfa- Returns:
- pdfa
-
setPdfa
public void setPdfa(OperationPdfa pdfa)
-
-