Class XMLTagExtractor

  • All Implemented Interfaces:
    ContentHandler, DTDHandler, EntityResolver, ErrorHandler, DeclHandler, EntityResolver2, LexicalHandler

    public class XMLTagExtractor
    extends DefaultHandler2
    This class is a helper class for developing PepperModules. The XMLTagExtractor generates a dictionary of the xml vocabulary. The dictionary consists of xml tag names, xml namespaces and attribute names from a source file and generates a java interface and a java class as well. The interface contains the xml namespace declarations, the xml element and attribute names as fields (public static final Strings). The generated java class implements that interface and further extends the DefaultHandler2 class, to read a xml file following the generated xml dictionary.
    This class can be very helpful, when creating PepperImporter or PepperExporter classes consuming or producing xml formats. In that case, a sample xml file (containing most or better all of the elements) can be used to extract all element names as keys for the implementation.
    For instance, the following xml file:
     <sentence xml:lang="en">
       <token pos="VBZ">Is</token>
       <token pos="DT" lemma="this">this</token>
       <token>example</token>
     </sentence>
     
    will be result in the following interface:
     public interface INTERFACE_NAME {
            public static final String TAG_TOKEN = "token";
            public static final String TAG_SENTENCE = "sentence";
            public static final String ATT_LEMMA = "lemma";
            public static final String ATT_XML_LANG = "xml:lang";
            public static final String ATT_POS = "pos";
     }
     
    where INTERFACE_NAME is the name of the xml file.
    and in the following class:
     public class INTERFACE_NAMEReader extends DefaultHandler2 implements Bergleute_WebLicht_BitPar {
            public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
                    if (TAG_TOKEN.equals(qName)) {
                    } else if (TAG_SENTENCE.equals(qName)) {
                    }
            }
     }
     

    Using as a library:
     XMLTagExtractor extractor = new XMLTagExtractor();
     extractor.setXmlResource(input);
     extractor.setJavaResource(output);
     extractor.extract();
     

    Running this tiny program from command line:

    java XMLTagExtractor.class -i XML_FILE -o OUTPUT_PATH
    Author:
    Florian Zipser
    • Field Detail

      • PREFIX_NAMESPACE

        public static final String PREFIX_NAMESPACE
        Name of prefix for xml namespaces prefix. For instance the xml namespace prefix <myns:token xmlns:myns="..."> will result in field:
        NS_MYNS
        See Also:
        Constant Field Values
      • PREFIX_NAMESPACE_VALUE

        public static final String PREFIX_NAMESPACE_VALUE
        Name of prefix for xml namespaces. For instance the xml namespace <myns:token xmlns:myns="https://ns.de"> will result in field:
        NS_VALUE_MYNS="https://ns.de"
        See Also:
        Constant Field Values
      • PREFIX_ELEMENT

        public static final String PREFIX_ELEMENT
        Name of prefix for xml tags. For instance the xml tag <token> will result in field:
        TAG_TOKEN
        See Also:
        Constant Field Values
      • PREFIX_ATTRIBUTE

        public static final String PREFIX_ATTRIBUTE
        Name of prefix for xml attribute. For instance the xml attribute <token pos="..."> will result in field:
        ATT_POS
        See Also:
        Constant Field Values
      • ARG_INPUT

        public static final String ARG_INPUT
        argument for command line call for determine input file
        See Also:
        Constant Field Values
      • ARG_OUTPUT

        public static final String ARG_OUTPUT
        argument for command line call for determine output file
        See Also:
        Constant Field Values
    • Constructor Detail

      • XMLTagExtractor

        public XMLTagExtractor()