eu.clarin.weblicht.wlfxb.tc.api
Interface TextCorpus

All Known Implementing Classes:
TextCorpusStored, TextCorpusStreamed

public interface TextCorpus

Interface TextCorpus represents TCF TextCorpus annotations. Corresponds to TCF TextCorpus specification. These annotations represent linguistic annotations on written connected text. The annotations are divided into the annotation layers, were each layer represents specific linguistic aspect. For example, TextCorpus can contain TokensLayer, PosTagsLayer, ConstituentParsingLayer, etc. In TextCorpus, annotations from any layer usually annotate (directly or indirectly) Token annotations from TokensLayer. An exception is TextLayer which is independent from any other layer. See also: TCF Format description.

Author:
Yana Panchenko

Method Summary
 LexicalSemanticsLayer createAntonymyLayer()
          Creates empty antonymy layer in this TextCorpus.
 ConstituentParsingLayer createConstituentParsingLayer(String tagset)
          Creates empty ConstituentParsingLayer with the given tagset in this TextCorpus.
 DependencyParsingLayer createDependencyParsingLayer(boolean multipleGovernorsPossible, boolean emptyTokensPossible)
          Creates empty DependencyParsingLayer in this TextCorpus.
 DependencyParsingLayer createDependencyParsingLayer(String tagset, boolean multipleGovernorsPossible, boolean emptyTokensPossible)
          Creates empty DependencyParsingLayer with the given tagset in this TextCorpus.
 DiscourseConnectivesLayer createDiscourseConnectivesLayer()
          Creates empty DiscourseConnectivesLayer in this TextCorpus.
 DiscourseConnectivesLayer createDiscourseConnectivesLayer(String typeTagset)
          Creates empty DiscourseConnectivesLayer in this TextCorpus.
 GeoLayer createGeoLayer(String source, GeoLongLatFormat coordFormat)
          Creates empty GeoLayer in this TextCorpus.
 GeoLayer createGeoLayer(String source, GeoLongLatFormat coordFormat, GeoContinentFormat conitentFormat, GeoCountryFormat countryFormat, GeoCapitalFormat capitalFormat)
          Creates empty GeoLayer in this TextCorpus.
 LexicalSemanticsLayer createHyperonymyLayer()
          Creates empty hyperonymy layer in this TextCorpus.
 LexicalSemanticsLayer createHyponymyLayer()
          Creates empty hyponymy layer in this TextCorpus.
 LemmasLayer createLemmasLayer()
          Creates empty LemmasLayer in this TextCorpus.
 MatchesLayer createMatchesLayer(String queryLanguage, String queryString)
          Creates empty MatchesLayer layers of this TextCorpus, ready to be filled in with the corpus match annotations.
 MorphologyLayer createMorphologyLayer()
          Creates empty MorphologyLayer in this TextCorpus.
 MorphologyLayer createMorphologyLayer(boolean hasSegmentation)
          Creates empty MorphologyLayer in this TextCorpus.
 MorphologyLayer createMorphologyLayer(boolean hasSegmentation, boolean hasCharOffsets)
          Creates empty MorphologyLayer in this TextCorpus.
 NamedEntitiesLayer createNamedEntitiesLayer(String entitiesType)
          Creates empty NamedEntitiesLayer with the given tagset for named entity types in this TextCorpus.
 OrthographyLayer createOrthographyLayer()
          Creates empty OrthographyLayer in this TextCorpus.
 PhoneticsLayer createPhotenicsLayer(String alphabet)
          Creates empty PhoneticsLayer with the given alphabet for phonetic transcriptions in this TextCorpus.
 PosTagsLayer createPosTagsLayer(String tagset)
          Creates empty PosTagsLayer with the given tagset in this TextCorpus.
 ReferencesLayer createReferencesLayer(String typetagset, String reltagset, String externalReferencesSource)
          Creates empty references layers of this TextCorpus, ready to be filled in with the references data.
 RelationsLayer createRelationsLayer(String type)
           
 SentencesLayer createSentencesLayer()
          Creates empty SentencesLayer in this TextCorpus.
 SentencesLayer createSentencesLayer(boolean hasCharOffsets)
          Creates empty SentencesLayer in this TextCorpus.
 LexicalSemanticsLayer createSynonymyLayer()
          Creates empty synonymy layer in this TextCorpus.
 TextLayer createTextLayer()
          Creates empty TextLayer in this TextCorpus.
 TextStructureLayer createTextStructureLayer()
          Creates empty TextStructureLayer in this TextCorpus.
 TokensLayer createTokensLayer()
          Creates empty TokensLayer in this TextCorpus.
 TokensLayer createTokensLayer(boolean hasCharOffsets)
          Creates empty TokensLayer in this TextCorpus.
 WordSensesLayer createWordSensesLayer(String source)
          Creates empty WordSensesLayer in this TextCorpus.
 WordSplittingLayer createWordSplittingLayer(String type)
          Creates empty WordSplittingLayer with the given type of the splitting in this TextCorpus.
 LexicalSemanticsLayer getAntonymyLayer()
          Gets antonymy layer of this TextCorpus.
 ConstituentParsingLayer getConstituentParsingLayer()
          Gets constituent parsing layer of this TextCorpus.
 DependencyParsingLayer getDependencyParsingLayer()
          Gets dependency parsing layer of this TextCorpus.
 DiscourseConnectivesLayer getDiscourseConnectivesLayer()
          Gets discourse connectives layer of this TextCorpus.
 GeoLayer getGeoLayer()
          Gets geo layer of this TextCorpus.
 LexicalSemanticsLayer getHyperonymyLayer()
          Gets hyperonymy layer of this TextCorpus.
 LexicalSemanticsLayer getHyponymyLayer()
          Gets hyponymy layer of this TextCorpus.
 String getLanguage()
          Gets the language of the text/tokens in this TextCorpus.
 List<TextCorpusLayer> getLayers()
          Gets all annotation layers of this TextCorpus.
 LemmasLayer getLemmasLayer()
          Gets lemmas layer of this TextCorpus.
 MatchesLayer getMatchesLayer()
          Gets matches layer of this TextCorpus.
 MorphologyLayer getMorphologyLayer()
          Gets morphology layer of this TextCorpus.
 NamedEntitiesLayer getNamedEntitiesLayer()
          Gets named entities layer of this TextCorpus.
 OrthographyLayer getOrthographyLayer()
          Gets orthography layer of this TextCorpus.
 PhoneticsLayer getPhoneticsLayer()
          Gets phonetics layer of this TextCorpus.
 PosTagsLayer getPosTagsLayer()
          Gets part-of-speech layer of this TextCorpus.
 ReferencesLayer getReferencesLayer()
          Gets references layer of this TextCorpus.
 RelationsLayer getRelationsLayer()
           
 SentencesLayer getSentencesLayer()
          Gets sentences layer of this TextCorpus.
 LexicalSemanticsLayer getSynonymyLayer()
          Gets synonymy layer of this TextCorpus.
 TextLayer getTextLayer()
          Gets text layer of this TextCorpus.
 TextStructureLayer getTextStructureLayer()
          Gets text structure layer of this TextCorpus.
 TokensLayer getTokensLayer()
          Gets tokens layer of this TextCorpus.
 WordSensesLayer getWordSensesLayer()
          Gets word senses layer of this TextCorpus.
 WordSplittingLayer getWordSplittingLayer()
          Gets word splitting layer of this TextCorpus.
 

Method Detail

getLanguage

String getLanguage()
Gets the language of the text/tokens in this TextCorpus.

Returns:
language of TextCorpus.

getLayers

List<TextCorpusLayer> getLayers()
Gets all annotation layers of this TextCorpus.

Returns:
annotations layers.

getTextLayer

TextLayer getTextLayer()
Gets text layer of this TextCorpus.

Returns:
annotation layer containing text.

createTextLayer

TextLayer createTextLayer()
Creates empty TextLayer in this TextCorpus.

Returns:
annotation layer that has been created.

getTokensLayer

TokensLayer getTokensLayer()
Gets tokens layer of this TextCorpus.

Returns:
annotation layer containing tokens.

createTokensLayer

TokensLayer createTokensLayer()
Creates empty TokensLayer in this TextCorpus.

Returns:
annotation layer that has been created.

createTokensLayer

TokensLayer createTokensLayer(boolean hasCharOffsets)
Creates empty TokensLayer in this TextCorpus.

Parameters:
hasCharOffsets - true if the Token objects in this TokensLayer will contain character offset in text information, false otherwise.
Returns:
annotation layer that has been created.

getLemmasLayer

LemmasLayer getLemmasLayer()
Gets lemmas layer of this TextCorpus.

Returns:
layer containing lemma annotations on Token objects from TokensLayer.

createLemmasLayer

LemmasLayer createLemmasLayer()
Creates empty LemmasLayer in this TextCorpus.

Returns:
annotation layer that has been created.

getPosTagsLayer

PosTagsLayer getPosTagsLayer()
Gets part-of-speech layer of this TextCorpus.

Returns:
layer containing part-of-speech annotations on Token objects from TokensLayer.

createPosTagsLayer

PosTagsLayer createPosTagsLayer(String tagset)
Creates empty PosTagsLayer with the given tagset in this TextCorpus.

Parameters:
tagset - of the part-of-speech annotations.
Returns:
annotation layer that has been created.

getSentencesLayer

SentencesLayer getSentencesLayer()
Gets sentences layer of this TextCorpus.

Returns:
layer containing sentence boundary annotations on Token objects from TokensLayer.

createSentencesLayer

SentencesLayer createSentencesLayer()
Creates empty SentencesLayer in this TextCorpus.

Returns:
annotation layer that has been created.

createSentencesLayer

SentencesLayer createSentencesLayer(boolean hasCharOffsets)
Creates empty SentencesLayer in this TextCorpus.

Parameters:
hasCharOffsets - true if the Sentence objects in this SentencesLayer will contain character offset in text information, false otherwise.
Returns:
annotation layer that has been created.

getConstituentParsingLayer

ConstituentParsingLayer getConstituentParsingLayer()
Gets constituent parsing layer of this TextCorpus.

Returns:
layer containing constituent parsing annotations on Token objects from TokensLayer.

createConstituentParsingLayer

ConstituentParsingLayer createConstituentParsingLayer(String tagset)
Creates empty ConstituentParsingLayer with the given tagset in this TextCorpus.

Parameters:
tagset - of the parsing annotations.
Returns:
annotation layer that has been created.

getDependencyParsingLayer

DependencyParsingLayer getDependencyParsingLayer()
Gets dependency parsing layer of this TextCorpus.

Returns:
layer containing dependency parsing annotations on Token objects from TokensLayer.

createDependencyParsingLayer

DependencyParsingLayer createDependencyParsingLayer(boolean multipleGovernorsPossible,
                                                    boolean emptyTokensPossible)
Creates empty DependencyParsingLayer in this TextCorpus.

Parameters:
multipleGovernorsPossible - true if a dependent can be governed by more than 1 governor, false otherwise.
emptyTokensPossible - true if dependency annotations can contain empty tokens.
Returns:
annotation layer that has been created.

createDependencyParsingLayer

DependencyParsingLayer createDependencyParsingLayer(String tagset,
                                                    boolean multipleGovernorsPossible,
                                                    boolean emptyTokensPossible)
Creates empty DependencyParsingLayer with the given tagset in this TextCorpus.

Parameters:
tagset - of the functions between dependent and governor.
multipleGovernorsPossible - true if a dependent can be governed by more than 1 governor, false otherwise.
emptyTokensPossible - true if dependency annotations can contain empty tokens.
Returns:
annotation layer that has been created.

getMorphologyLayer

MorphologyLayer getMorphologyLayer()
Gets morphology layer of this TextCorpus.

Returns:
layer containing morphological analysis annotations on Token objects from TokensLayer.

createMorphologyLayer

MorphologyLayer createMorphologyLayer()
Creates empty MorphologyLayer in this TextCorpus.

Returns:
annotation layer that has been created.

createMorphologyLayer

MorphologyLayer createMorphologyLayer(boolean hasSegmentation)
Creates empty MorphologyLayer in this TextCorpus.

Parameters:
hasSegmentation - true if morphology annotations contain segmentation analysis.
Returns:
annotation layer that has been created.

createMorphologyLayer

MorphologyLayer createMorphologyLayer(boolean hasSegmentation,
                                      boolean hasCharOffsets)
Creates empty MorphologyLayer in this TextCorpus.

Parameters:
hasSegmentation - true if morphology annotations contain segmentation analysis.
hasCharOffsets - true if the MorphologyAnalysis objects in this layer will contain character offset for segmentation within the token information, false otherwise.
Returns:
annotation layer that has been created.

getNamedEntitiesLayer

NamedEntitiesLayer getNamedEntitiesLayer()
Gets named entities layer of this TextCorpus.

Returns:
layer containing named entity annotations on Token objects from TokensLayer.

createNamedEntitiesLayer

NamedEntitiesLayer createNamedEntitiesLayer(String entitiesType)
Creates empty NamedEntitiesLayer with the given tagset for named entity types in this TextCorpus.

Parameters:
entitiesType - tagset of the named entity annotations.
Returns:
annotation layer that has been created.

getReferencesLayer

ReferencesLayer getReferencesLayer()
Gets references layer of this TextCorpus.

Returns:
layer containing reference/coreference annotations on Token objects from TokensLayer.

createReferencesLayer

ReferencesLayer createReferencesLayer(String typetagset,
                                      String reltagset,
                                      String externalReferencesSource)
Creates empty references layers of this TextCorpus, ready to be filled in with the references data.

Parameters:
typetagset - tagset for the mention type values of the references (should be null if no types are defined)
reltagset - tagset for relation values between the references (should be null if no relations are defined)
externalReferencesSource - name of external source (should be null if entities from the external source are not referenced)
Returns:
annotation layer that has been created.

getRelationsLayer

RelationsLayer getRelationsLayer()

createRelationsLayer

RelationsLayer createRelationsLayer(String type)

getMatchesLayer

MatchesLayer getMatchesLayer()
Gets matches layer of this TextCorpus.

Returns:
layer matches annotations on Token objects from TokensLayer.

createMatchesLayer

MatchesLayer createMatchesLayer(String queryLanguage,
                                String queryString)
Creates empty MatchesLayer layers of this TextCorpus, ready to be filled in with the corpus match annotations.

Parameters:
queryLanguage - language of the query used to extract corpus matches from a corpus.
queryString - the query used to extract corpus matches from a corpus.
Returns:
annotation layer that has been created.

getWordSplittingLayer

WordSplittingLayer getWordSplittingLayer()
Gets word splitting layer of this TextCorpus.

Returns:
layer split annotations (e.g. hyphenation) on Token objects from TokensLayer.

createWordSplittingLayer

WordSplittingLayer createWordSplittingLayer(String type)
Creates empty WordSplittingLayer with the given type of the splitting in this TextCorpus.

Parameters:
type - of the splitting, e.g. hyphenation.
Returns:
annotation layer that has been created.

getPhoneticsLayer

PhoneticsLayer getPhoneticsLayer()
Gets phonetics layer of this TextCorpus.

Returns:
layer containing phonetic transcriptions of Token objects from TokensLayer.

createPhotenicsLayer

PhoneticsLayer createPhotenicsLayer(String alphabet)
Creates empty PhoneticsLayer with the given alphabet for phonetic transcriptions in this TextCorpus.

Parameters:
alphabet - of the phonetic transcription annotations.
Returns:
annotation layer that has been created.

getGeoLayer

GeoLayer getGeoLayer()
Gets geo layer of this TextCorpus.

Returns:
layer containing geographical location annotations on Token objects from TokensLayer.

createGeoLayer

GeoLayer createGeoLayer(String source,
                        GeoLongLatFormat coordFormat)
Creates empty GeoLayer in this TextCorpus.

Parameters:
source - of the geographical coordinates.
coordFormat - format of the geographical coordinates.
Returns:
annotation layer that has been created.

createGeoLayer

GeoLayer createGeoLayer(String source,
                        GeoLongLatFormat coordFormat,
                        GeoContinentFormat conitentFormat,
                        GeoCountryFormat countryFormat,
                        GeoCapitalFormat capitalFormat)
Creates empty GeoLayer in this TextCorpus.

Parameters:
source - of the geographical coordinates.
coordFormat - format of the geographical coordinates.
conitentFormat - format of the continent (in case no continent is specified should be null).
countryFormat - format of the country (in case no country is specified should be null).
capitalFormat - format of the capital (in case no capital is specified should be null).
Returns:
annotation layer that has been created.

getOrthographyLayer

OrthographyLayer getOrthographyLayer()
Gets orthography layer of this TextCorpus.

Returns:
layer containing correct orthographic spellings of misspelled Token objects from TokensLayer.

createOrthographyLayer

OrthographyLayer createOrthographyLayer()
Creates empty OrthographyLayer in this TextCorpus.

Returns:
annotation layer that has been created.

getTextStructureLayer

TextStructureLayer getTextStructureLayer()
Gets text structure layer of this TextCorpus.

Returns:
layer containing original text structure (such as paragraphs, lines, pages, etc.), anchored on Token objects from TokensLayer.

createTextStructureLayer

TextStructureLayer createTextStructureLayer()
Creates empty TextStructureLayer in this TextCorpus.

Returns:
annotation layer that has been created.

getSynonymyLayer

LexicalSemanticsLayer getSynonymyLayer()
Gets synonymy layer of this TextCorpus.

Returns:
layer containing synonyms of Lemma objects from LemmasLayer.

createSynonymyLayer

LexicalSemanticsLayer createSynonymyLayer()
Creates empty synonymy layer in this TextCorpus.

Returns:
annotation layer that has been created.

getAntonymyLayer

LexicalSemanticsLayer getAntonymyLayer()
Gets antonymy layer of this TextCorpus.

Returns:
layer containing antonyms of Lemma objects from LemmasLayer.

createAntonymyLayer

LexicalSemanticsLayer createAntonymyLayer()
Creates empty antonymy layer in this TextCorpus.

Returns:
annotation layer that has been created.

getHyponymyLayer

LexicalSemanticsLayer getHyponymyLayer()
Gets hyponymy layer of this TextCorpus.

Returns:
layer containing hyponyms of Lemma objects from LemmasLayer.

createHyponymyLayer

LexicalSemanticsLayer createHyponymyLayer()
Creates empty hyponymy layer in this TextCorpus.

Returns:
annotation layer that has been created.

getHyperonymyLayer

LexicalSemanticsLayer getHyperonymyLayer()
Gets hyperonymy layer of this TextCorpus.

Returns:
layer containing hyperonyms of Lemma objects from LemmasLayer.

createHyperonymyLayer

LexicalSemanticsLayer createHyperonymyLayer()
Creates empty hyperonymy layer in this TextCorpus.

Returns:
annotation layer that has been created.

getDiscourseConnectivesLayer

DiscourseConnectivesLayer getDiscourseConnectivesLayer()
Gets discourse connectives layer of this TextCorpus.

Returns:
layer containing discourse connectives annotations on Token objects from TokensLayer.

createDiscourseConnectivesLayer

DiscourseConnectivesLayer createDiscourseConnectivesLayer()
Creates empty DiscourseConnectivesLayer in this TextCorpus.

Returns:
annotation layer that has been created.

createDiscourseConnectivesLayer

DiscourseConnectivesLayer createDiscourseConnectivesLayer(String typeTagset)
Creates empty DiscourseConnectivesLayer in this TextCorpus.

Parameters:
typeTagset - tagset used to label semantic types of the connectives
Returns:
annotation layer that has been created.

getWordSensesLayer

WordSensesLayer getWordSensesLayer()
Gets word senses layer of this TextCorpus.

Returns:
layer containing word sense annotations on Token objects from TokensLayer.

createWordSensesLayer

WordSensesLayer createWordSensesLayer(String source)
Creates empty WordSensesLayer in this TextCorpus.

Parameters:
source - from where the word senses are taken
Returns:
annotation layer that has been created.


Copyright © 2013-2014 Department of Linguistics, Tübingen University. All Rights Reserved.