Class StanfordNlpPipeline


  • public abstract class StanfordNlpPipeline
    extends Object
    A language specific configuration for OpenNLP based NER recognitions. Subclasses need to register them self as Services so that they can get injected to the StanfordNlpProcessor
    Author:
    Rupert Westenthaler
    • Field Detail

      • log

        protected final org.slf4j.Logger log
    • Constructor Detail

      • StanfordNlpPipeline

        protected StanfordNlpPipeline​(String name,
                                      Locale locale)
    • Method Detail

      • getPosTagset

        protected abstract TagSet<PosTag> getPosTagset()
        Getter for the TagSet of known PosTags.
        Returns:
        the PosTag set or null if none is available.
      • getNerTagset

        protected abstract TagSet<NerTag> getNerTagset()
        Getter for the TagSet of known NerTags.
        Returns:
        the NerTag set or null if none is available.
      • getRelTagset

        protected abstract TagSet<RelTag> getRelTagset()
        Getter for the TagSet of known RelTags.
        Returns:
        the RelTag set or null if none is available.
      • getLanguagePack

        public abstract edu.stanford.nlp.trees.TreebankLanguagePack getLanguagePack()
        The Treebank LanguagePack
        Returns:
        The treebank LanguagePack or null if none available
      • setProperties

        protected final void setProperties​(Properties properties)
        The properties used for activate(). This MUST BE set before activating the component
        Parameters:
        properties - the properties
      • setCaseSensitive

        protected void setCaseSensitive​(boolean caseSensitive)
        Setter for the caseSensitive state. If this pipeline uses caseless models this MUST BE set to false so that the extractor does parse a lower case version of the text
        Parameters:
        caseSensitive - the case sensitive state
      • getProperties

        public Properties getProperties()
        Getter for the properties holding the Stanford NLP configuration used by this component
        Returns:
        the properties or null if not set.
      • isCaseSensitive

        public boolean isCaseSensitive()
        If this pipeline uses case sensitive models
      • getAnnotators

        public List<String> getAnnotators()
        Read only list of the annotators used by the configured pipeline. Only available after activation
        Returns:
        the list of annotator (names) used in the pipeline
      • getAnnotator

        public edu.stanford.nlp.pipeline.Annotator getAnnotator​(String name)
        Getter for the Annotator. Only available after activation
        Parameters:
        name - the name of the annotator as configured in the pipeline. Valid strings are member of the getAnnotators() list
        Returns:
        the Annotator
        Throws:
        IllegalStateException - if the parsed name is not part of the getAnnotators() list
      • setAnnotatorImplementation

        protected final void setAnnotatorImplementation​(edu.stanford.nlp.pipeline.AnnotatorImplementations annotatorImplementation)
        Sets a custom AnnotatorImplementations class. Needs to be called before activation (typically as part of the constructor of the sub-class)
        Parameters:
        annotatorImplementation -
      • activate

        public final void activate()
                            throws IOException
        Activates the component based on its configuration defined in the #
        Throws:
        IOException
      • doActivate

        protected void doActivate()
        activation hook
      • initAnnotatorPool

        protected void initAnnotatorPool​(Properties properties)
        Initializes the Annotators as referenced by the 'annotators' field of the parsed properties.

        NOTE: This will only initialize AnnotatorFactories that are actually used in the configured pipeline (the getAnnotators() list)

        Parameters:
        properties - the properties
        annotatorImplementation - annotator impl instance
      • isActive

        public final boolean isActive()
      • getLocale

        public final Locale getLocale()
      • getLanguage

        public final String getLanguage()
        The language supported by the NerModel. This method can be called before activation
        Returns:
        the ISO 639-1 language code (e.g. "en" for English)
      • getName

        public final String getName()
      • getPipeline

        public final edu.stanford.nlp.pipeline.AnnotationPipeline getPipeline()
      • getPosTag

        public final PosTag getPosTag​(String tag)
        Uses the #posTagset and adhocPosTags to return existing instances of PosTags. If not present it will create a new one and add it to adhocPosTags
        Parameters:
        tag - the String pos tag as returned by the #tagger
        Returns:
        the PosTag or null if the parsed tag was blank AND blank (incl. null) tags are not mapped by the POS TagSet.
      • getNerTag

        public final NerTag getNerTag​(String tag)
        Uses the #posTagset and adhocPosTags to return existing instances of PosTags. If not present it will create a new one and add it to adhocPosTags
        Parameters:
        tag - the String pos tag as returned by the #tagger
        Returns:
        the PosTag guaranteed to be not null
      • getRelationTag

        public final RelTag getRelationTag​(String tag)
        Uses the #posTagset and adhocPosTags to return existing instances of PosTags. If not present it will create a new one and add it to adhocPosTags
        Parameters:
        tag - the String pos tag as returned by the #tagger
        Returns:
        the PosTag guaranteed to be not null
      • deactivate

        public final void deactivate()
      • doDeactivate

        protected void doDeactivate()
        Deactivation hook