Class OcrType
- java.lang.Object
-
- net.webpdf.wsclient.schema.operation.OcrType
-
- All Implemented Interfaces:
ParameterInterface
public class OcrType extends Object implements ParameterInterface
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">The "OCR" web service can be used to run character recognition in PDF documents or images. If recognition is run on images, they will be converted to PDF documents. More specifically, a page will be generated for each image in the PDF document, with this page containing the original image and a text layer with the recognized text. Character recognition on PDF documents will only work with documents that do not contain text already. Normally, these will be documents that were generated by scanners and that only have an image per page in the PDF document. </p>Java class for OcrType complex type
.The following schema fragment specifies the expected content contained within this class.
<complexType name="OcrType"> <complexContent> <restriction base="{http://www.w3.org/2001/XMLSchema}anyType"> <all> <element name="page" type="{http://schema.webpdf.de/1.0/operation}OcrPageType" minOccurs="0"/> <element name="pdfa" type="{http://schema.webpdf.de/1.0/operation}PdfaType" minOccurs="0"/> <element name="optimization" type="{http://schema.webpdf.de/1.0/operation}ImageOptimizationType" minOccurs="0"/> </all> <attribute name="language" type="{http://schema.webpdf.de/1.0/operation}OcrLanguageType" default="eng" /> <attribute name="outputFormat" default="pdf"> <simpleType> <restriction base="{http://schema.webpdf.de/1.0/operation}OcrOutputType"> </restriction> </simpleType> </attribute> <attribute name="checkResolution" type="{http://www.w3.org/2001/XMLSchema}boolean" default="true" /> <attribute name="imageDpi" default="200"> <simpleType> <restriction base="{http://schema.webpdf.de/1.0/operation}DpiType"> </restriction> </simpleType> </attribute> <attribute name="forceEachPage" type="{http://www.w3.org/2001/XMLSchema}boolean" default="false" /> <attribute name="normalizePageRotation" type="{http://www.w3.org/2001/XMLSchema}boolean" default="false" /> <attribute name="failOnWarning" type="{http://www.w3.org/2001/XMLSchema}boolean" default="false" /> <attribute name="jpegQuality" default="75"> <simpleType> <restriction base="{http://www.w3.org/2001/XMLSchema}int"> <minInclusive value="0"/> <maxInclusive value="100"/> </restriction> </simpleType> </attribute> <attribute name="ocrMode" type="{http://schema.webpdf.de/1.0/operation}OcrModeType" default="pageSegments" /> </restriction> </complexContent> </complexType>
-
-
Field Summary
Fields Modifier and Type Field Description protected BooleancheckResolution<?protected BooleanfailOnWarning<?protected BooleanforceEachPage<?protected IntegerimageDpi<?protected IntegerjpegQuality<?protected OcrLanguageTypelanguage<?protected BooleannormalizePageRotation<?protected OcrModeTypeocrMode<?protected ImageOptimizationTypeoptimizationprotected OcrOutputTypeoutputFormat<?protected OcrPageTypepageprotected PdfaTypepdfa
-
Constructor Summary
Constructors Constructor Description OcrType()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description intgetImageDpi()<?intgetJpegQuality()<?OcrLanguageTypegetLanguage()<?OcrModeTypegetOcrMode()<?ImageOptimizationTypegetOptimization()Gets the value of the optimization property.OcrOutputTypegetOutputFormat()<?OcrPageTypegetPage()Gets the value of the page property.PdfaTypegetPdfa()Gets the value of the pdfa property.booleanisCheckResolution()<?booleanisFailOnWarning()<?booleanisForceEachPage()<?booleanisNormalizePageRotation()<?booleanisSetCheckResolution()booleanisSetFailOnWarning()booleanisSetForceEachPage()booleanisSetImageDpi()booleanisSetJpegQuality()booleanisSetLanguage()booleanisSetNormalizePageRotation()booleanisSetOcrMode()booleanisSetOptimization()booleanisSetOutputFormat()booleanisSetPage()booleanisSetPdfa()voidsetCheckResolution(boolean value)Sets the value of the checkResolution property.voidsetFailOnWarning(boolean value)Sets the value of the failOnWarning property.voidsetForceEachPage(boolean value)Sets the value of the forceEachPage property.voidsetImageDpi(int value)Sets the value of the imageDpi property.voidsetJpegQuality(int value)Sets the value of the jpegQuality property.voidsetLanguage(OcrLanguageType value)Sets the value of the language property.voidsetNormalizePageRotation(boolean value)Sets the value of the normalizePageRotation property.voidsetOcrMode(OcrModeType value)Sets the value of the ocrMode property.voidsetOptimization(ImageOptimizationType value)Sets the value of the optimization property.voidsetOutputFormat(OcrOutputType value)Sets the value of the outputFormat property.voidsetPage(OcrPageType value)Sets the value of the page property.voidsetPdfa(PdfaType value)Sets the value of the pdfa property.voidunsetCheckResolution()voidunsetFailOnWarning()voidunsetForceEachPage()voidunsetImageDpi()voidunsetJpegQuality()voidunsetNormalizePageRotation()
-
-
-
Field Detail
-
page
protected OcrPageType page
-
pdfa
protected PdfaType pdfa
-
optimization
protected ImageOptimizationType optimization
-
language
protected OcrLanguageType language
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">Used to specify the language for the output document (PDF/image). The language must be defined for the character recognition operation (OCR) so that the "special characters" of the respective language (e.g. "üäö" in German) can be recognized better. At present, the following languages are supported: <ul><li>eng = English</li><li>fra = French</li><li>spa = Spanish</li><li>deu = German</li><li>ita = Italian</li></ul></p>
-
outputFormat
protected OcrOutputType outputFormat
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">Different output formats can be created during character recognition. Generally, the document is generated as a PDF document, but the output can also be as an ASCII document or an XML document if desired (HOCR). <ul><li>text = Text</li><li>hocr = XML (hOCR)</li><li>pdf = PDF</li></ul></p>
-
checkResolution
protected Boolean checkResolution
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">If "true," then the DPI resolution of the output file will be checked. Resolutions of less than 200 DPI are rejected in this check because as a rule, they do not produce good results for character recognition.</p>
-
imageDpi
protected Integer imageDpi
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">Used to set the minimum resolution images will be embedded with in resulting PDF documents. When a value of 0 is set for this parameter, the images shall be embedded using resolutions and dimensions as close as possible to the original source images.</p>
-
forceEachPage
protected Boolean forceEachPage
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">If a PDF document contains text content on any page, the web service will refuse to run character recognition again. If, however, a value of "true" is passed for this option, all the pages in the document will be considered individually and character recognition will be run on all pages that do not contain text (layers) so that a new layer with text will be generated for them.</p>
-
normalizePageRotation
protected Boolean normalizePageRotation
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">If "true", then, for the recognition of a rotated text, the system will attempt to rotate the page in such a way that the text in the document will not appear to be rotated and will be shown "upright."</p>
-
failOnWarning
protected Boolean failOnWarning
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">If "true", character recognition will fail even in the event of warnings that do not prevent recognition, but that make it very unlikely for a meaningful result to be generated.</p>
-
jpegQuality
protected Integer jpegQuality
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">A percentage that sets the compression ratio and influences the quality of JPEG images, that shall be embedded in resulting PDF documents. Higher values will result in less compressed images of higher quality.</p>
-
ocrMode
protected OcrModeType ocrMode
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">Specifies the mode used to find structured text on the pages. Depending on which mode is chosen, different requirements are set for the text and different assumptions are made about the text. <ul><li>pageSegments = The text on the page is clearly structured and decomposable into clear paragraphs and layout segments. Overlapping of text elements/lines does not occur. Headings and thus texts with deviating text sizes and font set, could be present.</li><li>column = The text is arranged on the pages in several, more or less uniform columns, next to each other. Font and text size are mostly uniform.</li><li>unfiltered = No assumptions are made about the text, any letters that can be found are recognized as such, regardless of whether they can be assigned to a text column, or line, or even a word. Font size and typeface can vary absolutely and texts are not necessarily arranged in clearly recognizable columns or according to a fixed layout. Texts and lines can overlap. (This mode usually recognizes more text (especially with more complex layouts), but usually also generates the most error detections, since no result is sorted out due to its deviation from the norm.</li></ul></p>
-
-
Method Detail
-
getPage
public OcrPageType getPage()
Gets the value of the page property.- Returns:
- possible object is
OcrPageType
-
setPage
public void setPage(OcrPageType value)
Sets the value of the page property.- Parameters:
value- allowed object isOcrPageType
-
isSetPage
public boolean isSetPage()
-
getPdfa
public PdfaType getPdfa()
Gets the value of the pdfa property.- Returns:
- possible object is
PdfaType
-
setPdfa
public void setPdfa(PdfaType value)
Sets the value of the pdfa property.- Parameters:
value- allowed object isPdfaType
-
isSetPdfa
public boolean isSetPdfa()
-
getOptimization
public ImageOptimizationType getOptimization()
Gets the value of the optimization property.- Returns:
- possible object is
ImageOptimizationType
-
setOptimization
public void setOptimization(ImageOptimizationType value)
Sets the value of the optimization property.- Parameters:
value- allowed object isImageOptimizationType
-
isSetOptimization
public boolean isSetOptimization()
-
getLanguage
public OcrLanguageType getLanguage()
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">Used to specify the language for the output document (PDF/image). The language must be defined for the character recognition operation (OCR) so that the "special characters" of the respective language (e.g. "üäö" in German) can be recognized better. At present, the following languages are supported: <ul><li>eng = English</li><li>fra = French</li><li>spa = Spanish</li><li>deu = German</li><li>ita = Italian</li></ul></p>- Returns:
- possible object is
OcrLanguageType
-
setLanguage
public void setLanguage(OcrLanguageType value)
Sets the value of the language property.- Parameters:
value- allowed object isOcrLanguageType- See Also:
getLanguage()
-
isSetLanguage
public boolean isSetLanguage()
-
getOutputFormat
public OcrOutputType getOutputFormat()
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">Different output formats can be created during character recognition. Generally, the document is generated as a PDF document, but the output can also be as an ASCII document or an XML document if desired (HOCR). <ul><li>text = Text</li><li>hocr = XML (hOCR)</li><li>pdf = PDF</li></ul></p>- Returns:
- possible object is
OcrOutputType
-
setOutputFormat
public void setOutputFormat(OcrOutputType value)
Sets the value of the outputFormat property.- Parameters:
value- allowed object isOcrOutputType- See Also:
getOutputFormat()
-
isSetOutputFormat
public boolean isSetOutputFormat()
-
isCheckResolution
public boolean isCheckResolution()
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">If "true," then the DPI resolution of the output file will be checked. Resolutions of less than 200 DPI are rejected in this check because as a rule, they do not produce good results for character recognition.</p>
- Returns:
- possible object is
Boolean
-
setCheckResolution
public void setCheckResolution(boolean value)
Sets the value of the checkResolution property.- Parameters:
value- allowed object isBoolean- See Also:
isCheckResolution()
-
isSetCheckResolution
public boolean isSetCheckResolution()
-
unsetCheckResolution
public void unsetCheckResolution()
-
getImageDpi
public int getImageDpi()
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">Used to set the minimum resolution images will be embedded with in resulting PDF documents. When a value of 0 is set for this parameter, the images shall be embedded using resolutions and dimensions as close as possible to the original source images.</p>
- Returns:
- possible object is
Integer
-
setImageDpi
public void setImageDpi(int value)
Sets the value of the imageDpi property.- Parameters:
value- allowed object isInteger- See Also:
getImageDpi()
-
isSetImageDpi
public boolean isSetImageDpi()
-
unsetImageDpi
public void unsetImageDpi()
-
isForceEachPage
public boolean isForceEachPage()
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">If a PDF document contains text content on any page, the web service will refuse to run character recognition again. If, however, a value of "true" is passed for this option, all the pages in the document will be considered individually and character recognition will be run on all pages that do not contain text (layers) so that a new layer with text will be generated for them.</p>
- Returns:
- possible object is
Boolean
-
setForceEachPage
public void setForceEachPage(boolean value)
Sets the value of the forceEachPage property.- Parameters:
value- allowed object isBoolean- See Also:
isForceEachPage()
-
isSetForceEachPage
public boolean isSetForceEachPage()
-
unsetForceEachPage
public void unsetForceEachPage()
-
isNormalizePageRotation
public boolean isNormalizePageRotation()
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">If "true", then, for the recognition of a rotated text, the system will attempt to rotate the page in such a way that the text in the document will not appear to be rotated and will be shown "upright."</p>
- Returns:
- possible object is
Boolean
-
setNormalizePageRotation
public void setNormalizePageRotation(boolean value)
Sets the value of the normalizePageRotation property.- Parameters:
value- allowed object isBoolean- See Also:
isNormalizePageRotation()
-
isSetNormalizePageRotation
public boolean isSetNormalizePageRotation()
-
unsetNormalizePageRotation
public void unsetNormalizePageRotation()
-
isFailOnWarning
public boolean isFailOnWarning()
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">If "true", character recognition will fail even in the event of warnings that do not prevent recognition, but that make it very unlikely for a meaningful result to be generated.</p>
- Returns:
- possible object is
Boolean
-
setFailOnWarning
public void setFailOnWarning(boolean value)
Sets the value of the failOnWarning property.- Parameters:
value- allowed object isBoolean- See Also:
isFailOnWarning()
-
isSetFailOnWarning
public boolean isSetFailOnWarning()
-
unsetFailOnWarning
public void unsetFailOnWarning()
-
getJpegQuality
public int getJpegQuality()
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">A percentage that sets the compression ratio and influences the quality of JPEG images, that shall be embedded in resulting PDF documents. Higher values will result in less compressed images of higher quality.</p>
- Returns:
- possible object is
Integer
-
setJpegQuality
public void setJpegQuality(int value)
Sets the value of the jpegQuality property.- Parameters:
value- allowed object isInteger- See Also:
getJpegQuality()
-
isSetJpegQuality
public boolean isSetJpegQuality()
-
unsetJpegQuality
public void unsetJpegQuality()
-
getOcrMode
public OcrModeType getOcrMode()
<?xml version="1.0" encoding="UTF-8"?><p xmlns:p816672_="https://jakarta.ee/xml/ns/jaxb" xmlns:p967521_="http://jaxb2-commons.dev.java.net/basic/inheritance" xmlns:tns="http://schema.webpdf.de/1.0/operation" xmlns:xs="http://www.w3.org/2001/XMLSchema">Specifies the mode used to find structured text on the pages. Depending on which mode is chosen, different requirements are set for the text and different assumptions are made about the text. <ul><li>pageSegments = The text on the page is clearly structured and decomposable into clear paragraphs and layout segments. Overlapping of text elements/lines does not occur. Headings and thus texts with deviating text sizes and font set, could be present.</li><li>column = The text is arranged on the pages in several, more or less uniform columns, next to each other. Font and text size are mostly uniform.</li><li>unfiltered = No assumptions are made about the text, any letters that can be found are recognized as such, regardless of whether they can be assigned to a text column, or line, or even a word. Font size and typeface can vary absolutely and texts are not necessarily arranged in clearly recognizable columns or according to a fixed layout. Texts and lines can overlap. (This mode usually recognizes more text (especially with more complex layouts), but usually also generates the most error detections, since no result is sorted out due to its deviation from the norm.</li></ul></p>- Returns:
- possible object is
OcrModeType
-
setOcrMode
public void setOcrMode(OcrModeType value)
Sets the value of the ocrMode property.- Parameters:
value- allowed object isOcrModeType- See Also:
getOcrMode()
-
isSetOcrMode
public boolean isSetOcrMode()
-
-