Package org.corpus_tools.pepper.cli
Class XMLTagExtractor
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.xml.sax.ext.DefaultHandler2
-
- org.corpus_tools.pepper.cli.XMLTagExtractor
-
- All Implemented Interfaces:
ContentHandler,DTDHandler,EntityResolver,ErrorHandler,DeclHandler,EntityResolver2,LexicalHandler
public class XMLTagExtractor extends DefaultHandler2
This class is a helper class for developingPepperModules. TheXMLTagExtractorgenerates a dictionary of the xml vocabulary. The dictionary consists of xml tag names, xml namespaces and attribute names from a source file and generates a java interface and a java class as well. The interface contains the xml namespace declarations, the xml element and attribute names as fields (public static final Strings). The generated java class implements that interface and further extends theDefaultHandler2class, to read a xml file following the generated xml dictionary.
This class can be very helpful, when creatingPepperImporterorPepperExporterclasses consuming or producing xml formats. In that case, a sample xml file (containing most or better all of the elements) can be used to extract all element names as keys for the implementation.
For instance, the following xml file:<sentence xml:lang="en"> <token pos="VBZ">Is</token> <token pos="DT" lemma="this">this</token> <token>example</token> </sentence>
will be result in the following interface:public interface INTERFACE_NAME { public static final String TAG_TOKEN = "token"; public static final String TAG_SENTENCE = "sentence"; public static final String ATT_LEMMA = "lemma"; public static final String ATT_XML_LANG = "xml:lang"; public static final String ATT_POS = "pos"; }where INTERFACE_NAME is the name of the xml file.
and in the following class:public class INTERFACE_NAMEReader extends DefaultHandler2 implements Bergleute_WebLicht_BitPar { public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { if (TAG_TOKEN.equals(qName)) { } else if (TAG_SENTENCE.equals(qName)) { } } }
Using as a library:XMLTagExtractor extractor = new XMLTagExtractor(); extractor.setXmlResource(input); extractor.setJavaResource(output); extractor.extract();
Running this tiny program from command line:
java XMLTagExtractor.class -i XML_FILE -o OUTPUT_PATH- Author:
- Florian Zipser
-
-
Field Summary
Fields Modifier and Type Field Description static StringARG_INPUTargument for command line call for determine input filestatic StringARG_OUTPUTargument for command line call for determine output filestatic StringPREFIX_ATTRIBUTEName of prefix for xml attribute.static StringPREFIX_ELEMENTName of prefix for xml tags.static StringPREFIX_NAMESPACEName of prefix for xml namespaces prefix.static StringPREFIX_NAMESPACE_VALUEName of prefix for xml namespaces.
-
Constructor Summary
Constructors Constructor Description XMLTagExtractor()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidextract(){@inheritDoc XMLTagExtractor}URIgetJavaResource()returns java file to be parsed.URIgetXmlResource()returns xml file to be parsed.static voidmain(String[] args){@inheritDoc XMLTagExtractor} java XMLTagExtractor.class -i XML_FILE -o OUTPUT_PATHvoidsetJavaResource(URI resource)Sets java file to be parsed.voidsetXmlResource(URI resource)Sets xml file to be parsed.voidstartElement(String uri, String localName, String qName, Attributes attributes)-
Methods inherited from class org.xml.sax.ext.DefaultHandler2
attributeDecl, comment, elementDecl, endCDATA, endDTD, endEntity, externalEntityDecl, getExternalSubset, internalEntityDecl, resolveEntity, resolveEntity, startCDATA, startDTD, startEntity
-
Methods inherited from class org.xml.sax.helpers.DefaultHandler
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning
-
-
-
-
Field Detail
-
PREFIX_NAMESPACE
public static final String PREFIX_NAMESPACE
Name of prefix for xml namespaces prefix. For instance the xml namespace prefix <myns:token xmlns:myns="..."> will result in field:
NS_MYNS- See Also:
- Constant Field Values
-
PREFIX_NAMESPACE_VALUE
public static final String PREFIX_NAMESPACE_VALUE
Name of prefix for xml namespaces. For instance the xml namespace <myns:token xmlns:myns="https://ns.de"> will result in field:
NS_VALUE_MYNS="https://ns.de"- See Also:
- Constant Field Values
-
PREFIX_ELEMENT
public static final String PREFIX_ELEMENT
Name of prefix for xml tags. For instance the xml tag <token> will result in field:
TAG_TOKEN- See Also:
- Constant Field Values
-
PREFIX_ATTRIBUTE
public static final String PREFIX_ATTRIBUTE
Name of prefix for xml attribute. For instance the xml attribute <token pos="..."> will result in field:
ATT_POS- See Also:
- Constant Field Values
-
ARG_INPUT
public static final String ARG_INPUT
argument for command line call for determine input file- See Also:
- Constant Field Values
-
ARG_OUTPUT
public static final String ARG_OUTPUT
argument for command line call for determine output file- See Also:
- Constant Field Values
-
-
Method Detail
-
setXmlResource
public void setXmlResource(URI resource) throws FileNotFoundException
Sets xml file to be parsed.- Throws:
FileNotFoundException
-
getXmlResource
public URI getXmlResource()
returns xml file to be parsed.
-
setJavaResource
public void setJavaResource(URI resource) throws FileNotFoundException
Sets java file to be parsed.- Throws:
FileNotFoundException
-
getJavaResource
public URI getJavaResource()
returns java file to be parsed.
-
extract
public void extract()
{@inheritDoc XMLTagExtractor}
-
startElement
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException
- Specified by:
startElementin interfaceContentHandler- Overrides:
startElementin classDefaultHandler- Throws:
SAXException
-
main
public static void main(String[] args)
{@inheritDoc XMLTagExtractor} java XMLTagExtractor.class -i XML_FILE -o OUTPUT_PATH- Parameters:
args- -i XML_FILE -o OUTPUT_PATH
-
-