Package eu.clarin.weblicht.wlfxb.tc.api
Interface TextCorpus
-
- All Known Implementing Classes:
TextCorpusStored,TextCorpusStreamed,TextCorpusStreamedWithReplaceableLayers
public interface TextCorpusInterface TextCorpus represents TCF TextCorpus annotations. Corresponds to TCF TextCorpus specification. These annotations represent linguistic annotations on written connected text. The annotations are divided into the annotation layers, were each layer represents specific linguistic aspect. For example, TextCorpus can containTokensLayer,PosTagsLayer,ConstituentParsingLayer, etc. In TextCorpus, annotations from any layer usually annotate (directly or indirectly)Tokenannotations fromTokensLayer. An exception isTextLayerwhich is independent from any other layer. See also: TCF Format description.- Author:
- Yana Panchenko
-
-
Method Summary
Modifier and Type Method Description LexicalSemanticsLayercreateAntonymyLayer()Creates empty antonymy layer in this TextCorpus.ChunksLayercreateChunksLayer(String entitiesType)Creates emptyChunksLayerwith the given tagset for named entity types in this TextCorpus.ConstituentParsingLayercreateConstituentParsingLayer(String tagset)Creates emptyConstituentParsingLayerwith the given tagset in this TextCorpus.DependencyParsingLayercreateDependencyParsingLayer(boolean multipleGovernorsPossible, boolean emptyTokensPossible)Creates emptyDependencyParsingLayerin this TextCorpus.DependencyParsingLayercreateDependencyParsingLayer(String tagset, boolean multipleGovernorsPossible, boolean emptyTokensPossible)Creates emptyDependencyParsingLayerwith the given tagset in this TextCorpus.DiscourseConnectivesLayercreateDiscourseConnectivesLayer()Creates emptyDiscourseConnectivesLayerin this TextCorpus.DiscourseConnectivesLayercreateDiscourseConnectivesLayer(String typeTagset)Creates emptyDiscourseConnectivesLayerin this TextCorpus.GeoLayercreateGeoLayer(String source, GeoLongLatFormat coordFormat)Creates emptyGeoLayerin this TextCorpus.GeoLayercreateGeoLayer(String source, GeoLongLatFormat coordFormat, GeoContinentFormat conitentFormat, GeoCountryFormat countryFormat, GeoCapitalFormat capitalFormat)Creates emptyGeoLayerin this TextCorpus.LexicalSemanticsLayercreateHyperonymyLayer()Creates empty hyperonymy layer in this TextCorpus.LexicalSemanticsLayercreateHyponymyLayer()Creates empty hyponymy layer in this TextCorpus.LemmasLayercreateLemmasLayer()Creates emptyLemmasLayerin this TextCorpus.MatchesLayercreateMatchesLayer(String queryLanguage, String queryString)Creates empty MatchesLayer layers of this TextCorpus, ready to be filled in with the corpus match annotations.MorphologyLayercreateMorphologyLayer()Creates emptyMorphologyLayerin this TextCorpus.MorphologyLayercreateMorphologyLayer(boolean hasSegmentation)Creates emptyMorphologyLayerin this TextCorpus.MorphologyLayercreateMorphologyLayer(boolean hasSegmentation, boolean hasCharOffsets)Creates emptyMorphologyLayerin this TextCorpus.MorphologyLayercreateMorphologyLayer(String tagset)Creates emptyMorphologyLayerin this TextCorpus.MorphologyLayercreateMorphologyLayer(String tagset, boolean hasSegmentation)Creates emptyMorphologyLayerin this TextCorpus.MorphologyLayercreateMorphologyLayer(String tagset, boolean hasSegmentation, boolean hasCharOffsets)Creates emptyMorphologyLayerin this TextCorpus.NamedEntitiesLayercreateNamedEntitiesLayer(String entitiesType)Creates emptyNamedEntitiesLayerwith the given tagset for named entity types in this TextCorpus.OrthographyLayercreateOrthographyLayer()Creates emptyOrthographyLayerin this TextCorpus.PhoneticsLayercreatePhotenicsLayer(String alphabet)Creates emptyPhoneticsLayerwith the given alphabet for phonetic transcriptions in this TextCorpus.PosTagsLayercreatePosTagsLayer(String tagset)Creates emptyPosTagsLayerwith the given tagset in this TextCorpus.ReferencesLayercreateReferencesLayer(String typetagset, String reltagset, String externalReferencesSource)Creates empty references layers of this TextCorpus, ready to be filled in with the references data.RelationsLayercreateRelationsLayer(String type)SentencesLayercreateSentencesLayer()Creates emptySentencesLayerin this TextCorpus.SentencesLayercreateSentencesLayer(boolean hasCharOffsets)Creates emptySentencesLayerin this TextCorpus.LexicalSemanticsLayercreateSynonymyLayer()Creates empty synonymy layer in this TextCorpus.TextLayercreateTextLayer()Creates emptyTextLayerin this TextCorpus.TextSourceLayercreateTextSourceLayer()Creates emptyTextSourceLayerin this TextCorpus.TextStructureLayercreateTextStructureLayer()Creates emptyTextStructureLayerin this TextCorpus.TokensLayercreateTokensLayer()Creates emptyTokensLayerin this TextCorpus.TokensLayercreateTokensLayer(boolean hasCharOffsets)Creates emptyTokensLayerin this TextCorpus.TopologicalFieldsLayercreateTopologicalFieldsLayer(String tagset)Creates emptyTopologicalFieldsLayerwith the given tagset in this TextCorpus.WordSensesLayercreateWordSensesLayer(String source)Creates emptyWordSensesLayerin this TextCorpus.WordSplittingLayercreateWordSplittingLayer(String type)Creates emptyWordSplittingLayerwith the given type of the splitting in this TextCorpus.LexicalSemanticsLayergetAntonymyLayer()Gets antonymy layer of this TextCorpus.ChunksLayergetChunksLayer()Gets chunks layer of this TextCorpus.ConstituentParsingLayergetConstituentParsingLayer()Gets constituent parsing layer of this TextCorpus.DependencyParsingLayergetDependencyParsingLayer()Gets dependency parsing layer of this TextCorpus.DiscourseConnectivesLayergetDiscourseConnectivesLayer()Gets discourse connectives layer of this TextCorpus.GeoLayergetGeoLayer()Gets geo layer of this TextCorpus.LexicalSemanticsLayergetHyperonymyLayer()Gets hyperonymy layer of this TextCorpus.LexicalSemanticsLayergetHyponymyLayer()Gets hyponymy layer of this TextCorpus.StringgetLanguage()Gets the language of the text/tokens in this TextCorpus.List<TextCorpusLayer>getLayers()Gets all annotation layers of this TextCorpus.LemmasLayergetLemmasLayer()Gets lemmas layer of this TextCorpus.MatchesLayergetMatchesLayer()Gets matches layer of this TextCorpus.MorphologyLayergetMorphologyLayer()Gets morphology layer of this TextCorpus.NamedEntitiesLayergetNamedEntitiesLayer()Gets named entities layer of this TextCorpus.OrthographyLayergetOrthographyLayer()Gets orthography layer of this TextCorpus.PhoneticsLayergetPhoneticsLayer()Gets phonetics layer of this TextCorpus.PosTagsLayergetPosTagsLayer()Gets part-of-speech layer of this TextCorpus.ReferencesLayergetReferencesLayer()Gets references layer of this TextCorpus.RelationsLayergetRelationsLayer()SentencesLayergetSentencesLayer()Gets sentences layer of this TextCorpus.LexicalSemanticsLayergetSynonymyLayer()Gets synonymy layer of this TextCorpus.TextLayergetTextLayer()Gets text layer of this TextCorpus.TextSourceLayergetTextSourceLayer()Gets textSource layer of this TextSource.TextStructureLayergetTextStructureLayer()Gets text structure layer of this TextCorpus.TokensLayergetTokensLayer()Gets tokens layer of this TextCorpus.TopologicalFieldsLayergetTopologicalFieldsLayer()Gets topological fields layer of this TextCorpus.WordSensesLayergetWordSensesLayer()Gets word senses layer of this TextCorpus.WordSplittingLayergetWordSplittingLayer()Gets word splitting layer of this TextCorpus.
-
-
-
Method Detail
-
getLanguage
String getLanguage()
Gets the language of the text/tokens in this TextCorpus.- Returns:
- language of TextCorpus.
-
getLayers
List<TextCorpusLayer> getLayers()
Gets all annotation layers of this TextCorpus.- Returns:
- annotations layers.
-
getTextLayer
TextLayer getTextLayer()
Gets text layer of this TextCorpus.- Returns:
- annotation layer containing text.
-
createTextLayer
TextLayer createTextLayer()
Creates emptyTextLayerin this TextCorpus.- Returns:
- annotation layer that has been created.
-
getTokensLayer
TokensLayer getTokensLayer()
Gets tokens layer of this TextCorpus.- Returns:
- annotation layer containing tokens.
-
createTokensLayer
TokensLayer createTokensLayer()
Creates emptyTokensLayerin this TextCorpus.- Returns:
- annotation layer that has been created.
-
createTokensLayer
TokensLayer createTokensLayer(boolean hasCharOffsets)
Creates emptyTokensLayerin this TextCorpus.- Parameters:
hasCharOffsets- true if theTokenobjects in this TokensLayer will contain character offset in text information, false otherwise.- Returns:
- annotation layer that has been created.
-
getLemmasLayer
LemmasLayer getLemmasLayer()
Gets lemmas layer of this TextCorpus.- Returns:
- layer containing lemma annotations on
Tokenobjects fromTokensLayer.
-
createLemmasLayer
LemmasLayer createLemmasLayer()
Creates emptyLemmasLayerin this TextCorpus.- Returns:
- annotation layer that has been created.
-
getPosTagsLayer
PosTagsLayer getPosTagsLayer()
Gets part-of-speech layer of this TextCorpus.- Returns:
- layer containing part-of-speech annotations on
Tokenobjects fromTokensLayer.
-
createPosTagsLayer
PosTagsLayer createPosTagsLayer(String tagset)
Creates emptyPosTagsLayerwith the given tagset in this TextCorpus.- Parameters:
tagset- of the part-of-speech annotations.- Returns:
- annotation layer that has been created.
-
getTopologicalFieldsLayer
TopologicalFieldsLayer getTopologicalFieldsLayer()
Gets topological fields layer of this TextCorpus.- Returns:
- layer containing topological field annotations on
Tokenobjects fromTokensLayer.
-
createTopologicalFieldsLayer
TopologicalFieldsLayer createTopologicalFieldsLayer(String tagset)
Creates emptyTopologicalFieldsLayerwith the given tagset in this TextCorpus.- Parameters:
tagset- of the topological fields.- Returns:
- annotation layer that has been created.
-
getSentencesLayer
SentencesLayer getSentencesLayer()
Gets sentences layer of this TextCorpus.- Returns:
- layer containing sentence boundary annotations on
Tokenobjects fromTokensLayer.
-
createSentencesLayer
SentencesLayer createSentencesLayer()
Creates emptySentencesLayerin this TextCorpus.- Returns:
- annotation layer that has been created.
-
createSentencesLayer
SentencesLayer createSentencesLayer(boolean hasCharOffsets)
Creates emptySentencesLayerin this TextCorpus.- Parameters:
hasCharOffsets- true if theSentenceobjects in this SentencesLayer will contain character offset in text information, false otherwise.- Returns:
- annotation layer that has been created.
-
getConstituentParsingLayer
ConstituentParsingLayer getConstituentParsingLayer()
Gets constituent parsing layer of this TextCorpus.- Returns:
- layer containing constituent parsing annotations on
Tokenobjects fromTokensLayer.
-
createConstituentParsingLayer
ConstituentParsingLayer createConstituentParsingLayer(String tagset)
Creates emptyConstituentParsingLayerwith the given tagset in this TextCorpus.- Parameters:
tagset- of the parsing annotations.- Returns:
- annotation layer that has been created.
-
getDependencyParsingLayer
DependencyParsingLayer getDependencyParsingLayer()
Gets dependency parsing layer of this TextCorpus.- Returns:
- layer containing dependency parsing annotations on
Tokenobjects fromTokensLayer.
-
createDependencyParsingLayer
DependencyParsingLayer createDependencyParsingLayer(boolean multipleGovernorsPossible, boolean emptyTokensPossible)
Creates emptyDependencyParsingLayerin this TextCorpus.- Parameters:
multipleGovernorsPossible- true if a dependent can be governed by more than 1 governor, false otherwise.emptyTokensPossible- true if dependency annotations can contain empty tokens.- Returns:
- annotation layer that has been created.
-
createDependencyParsingLayer
DependencyParsingLayer createDependencyParsingLayer(String tagset, boolean multipleGovernorsPossible, boolean emptyTokensPossible)
Creates emptyDependencyParsingLayerwith the given tagset in this TextCorpus.- Parameters:
tagset- of the functions between dependent and governor.multipleGovernorsPossible- true if a dependent can be governed by more than 1 governor, false otherwise.emptyTokensPossible- true if dependency annotations can contain empty tokens.- Returns:
- annotation layer that has been created.
-
getMorphologyLayer
MorphologyLayer getMorphologyLayer()
Gets morphology layer of this TextCorpus.- Returns:
- layer containing morphological analysis annotations on
Tokenobjects fromTokensLayer.
-
createMorphologyLayer
MorphologyLayer createMorphologyLayer()
Creates emptyMorphologyLayerin this TextCorpus.- Returns:
- annotation layer that has been created.
-
createMorphologyLayer
MorphologyLayer createMorphologyLayer(String tagset)
Creates emptyMorphologyLayerin this TextCorpus.- Parameters:
tagset- of the morphology annotations contain- Returns:
- annotation layer that has been created.
-
createMorphologyLayer
MorphologyLayer createMorphologyLayer(boolean hasSegmentation)
Creates emptyMorphologyLayerin this TextCorpus.- Parameters:
hasSegmentation- true if morphology annotations contain segmentation analysis.- Returns:
- annotation layer that has been created.
-
createMorphologyLayer
MorphologyLayer createMorphologyLayer(String tagset, boolean hasSegmentation)
Creates emptyMorphologyLayerin this TextCorpus.- Parameters:
tagset- of the morphology annotations containhasSegmentation- true if morphology annotations contain segmentation analysis.- Returns:
- annotation layer that has been created.
-
createMorphologyLayer
MorphologyLayer createMorphologyLayer(boolean hasSegmentation, boolean hasCharOffsets)
Creates emptyMorphologyLayerin this TextCorpus.- Parameters:
hasSegmentation- true if morphology annotations contain segmentation analysis.hasCharOffsets- true if theMorphologyAnalysisobjects in this layer will contain character offset for segmentation within the token information, false otherwise.- Returns:
- annotation layer that has been created.
-
createMorphologyLayer
MorphologyLayer createMorphologyLayer(String tagset, boolean hasSegmentation, boolean hasCharOffsets)
Creates emptyMorphologyLayerin this TextCorpus.- Parameters:
tagset- of the morphology annotations containhasSegmentation- true if morphology annotations contain segmentation analysis.hasCharOffsets- true if theMorphologyAnalysisobjects in this layer will contain character offset for segmentation within the token information, false otherwise.- Returns:
- annotation layer that has been created.
-
getNamedEntitiesLayer
NamedEntitiesLayer getNamedEntitiesLayer()
Gets named entities layer of this TextCorpus.- Returns:
- layer containing named entity annotations on
Tokenobjects fromTokensLayer.
-
createNamedEntitiesLayer
NamedEntitiesLayer createNamedEntitiesLayer(String entitiesType)
Creates emptyNamedEntitiesLayerwith the given tagset for named entity types in this TextCorpus.- Parameters:
entitiesType- tagset of the named entity annotations.- Returns:
- annotation layer that has been created.
-
createChunksLayer
ChunksLayer createChunksLayer(String entitiesType)
Creates emptyChunksLayerwith the given tagset for named entity types in this TextCorpus.- Parameters:
entitiesType- tagset of the chunk annotations.- Returns:
- annotation layer that has been created.
-
getChunksLayer
ChunksLayer getChunksLayer()
Gets chunks layer of this TextCorpus.- Returns:
- layer containing chunk annotations on
Tokenobjects fromTokensLayer.
-
getReferencesLayer
ReferencesLayer getReferencesLayer()
Gets references layer of this TextCorpus.- Returns:
- layer containing reference/coreference annotations on
Tokenobjects fromTokensLayer.
-
createReferencesLayer
ReferencesLayer createReferencesLayer(String typetagset, String reltagset, String externalReferencesSource)
Creates empty references layers of this TextCorpus, ready to be filled in with the references data.- Parameters:
typetagset- tagset for the mention type values of the references (should be null if no types are defined)reltagset- tagset for relation values between the references (should be null if no relations are defined)externalReferencesSource- name of external source (should be null if entities from the external source are not referenced)- Returns:
- annotation layer that has been created.
-
getRelationsLayer
RelationsLayer getRelationsLayer()
-
createRelationsLayer
RelationsLayer createRelationsLayer(String type)
-
getMatchesLayer
MatchesLayer getMatchesLayer()
Gets matches layer of this TextCorpus.- Returns:
- layer matches annotations on
Tokenobjects fromTokensLayer.
-
createMatchesLayer
MatchesLayer createMatchesLayer(String queryLanguage, String queryString)
Creates empty MatchesLayer layers of this TextCorpus, ready to be filled in with the corpus match annotations.- Parameters:
queryLanguage- language of the query used to extract corpus matches from a corpus.queryString- the query used to extract corpus matches from a corpus.- Returns:
- annotation layer that has been created.
-
getWordSplittingLayer
WordSplittingLayer getWordSplittingLayer()
Gets word splitting layer of this TextCorpus.- Returns:
- layer split annotations (e.g. hyphenation) on
Tokenobjects fromTokensLayer.
-
createWordSplittingLayer
WordSplittingLayer createWordSplittingLayer(String type)
Creates emptyWordSplittingLayerwith the given type of the splitting in this TextCorpus.- Parameters:
type- of the splitting, e.g. hyphenation.- Returns:
- annotation layer that has been created.
-
getPhoneticsLayer
PhoneticsLayer getPhoneticsLayer()
Gets phonetics layer of this TextCorpus.- Returns:
- layer containing phonetic transcriptions of
Tokenobjects fromTokensLayer.
-
createPhotenicsLayer
PhoneticsLayer createPhotenicsLayer(String alphabet)
Creates emptyPhoneticsLayerwith the given alphabet for phonetic transcriptions in this TextCorpus.- Parameters:
alphabet- of the phonetic transcription annotations.- Returns:
- annotation layer that has been created.
-
getGeoLayer
GeoLayer getGeoLayer()
Gets geo layer of this TextCorpus.- Returns:
- layer containing geographical location annotations on
Tokenobjects fromTokensLayer.
-
createGeoLayer
GeoLayer createGeoLayer(String source, GeoLongLatFormat coordFormat)
Creates emptyGeoLayerin this TextCorpus.- Parameters:
source- of the geographical coordinates.coordFormat- format of the geographical coordinates.- Returns:
- annotation layer that has been created.
-
createGeoLayer
GeoLayer createGeoLayer(String source, GeoLongLatFormat coordFormat, GeoContinentFormat conitentFormat, GeoCountryFormat countryFormat, GeoCapitalFormat capitalFormat)
Creates emptyGeoLayerin this TextCorpus.- Parameters:
source- of the geographical coordinates.coordFormat- format of the geographical coordinates.conitentFormat- format of the continent (in case no continent is specified should be null).countryFormat- format of the country (in case no country is specified should be null).capitalFormat- format of the capital (in case no capital is specified should be null).- Returns:
- annotation layer that has been created.
-
getOrthographyLayer
OrthographyLayer getOrthographyLayer()
Gets orthography layer of this TextCorpus.- Returns:
- layer containing correct orthographic spellings of misspelled
Tokenobjects fromTokensLayer.
-
createOrthographyLayer
OrthographyLayer createOrthographyLayer()
Creates emptyOrthographyLayerin this TextCorpus.- Returns:
- annotation layer that has been created.
-
getTextStructureLayer
TextStructureLayer getTextStructureLayer()
Gets text structure layer of this TextCorpus.- Returns:
- layer containing original text structure (such as paragraphs,
lines, pages, etc.), anchored on
Tokenobjects fromTokensLayer.
-
createTextStructureLayer
TextStructureLayer createTextStructureLayer()
Creates emptyTextStructureLayerin this TextCorpus.- Returns:
- annotation layer that has been created.
-
getSynonymyLayer
LexicalSemanticsLayer getSynonymyLayer()
Gets synonymy layer of this TextCorpus.- Returns:
- layer containing synonyms of
Lemmaobjects fromLemmasLayer.
-
createSynonymyLayer
LexicalSemanticsLayer createSynonymyLayer()
Creates empty synonymy layer in this TextCorpus.- Returns:
- annotation layer that has been created.
-
getAntonymyLayer
LexicalSemanticsLayer getAntonymyLayer()
Gets antonymy layer of this TextCorpus.- Returns:
- layer containing antonyms of
Lemmaobjects fromLemmasLayer.
-
createAntonymyLayer
LexicalSemanticsLayer createAntonymyLayer()
Creates empty antonymy layer in this TextCorpus.- Returns:
- annotation layer that has been created.
-
getHyponymyLayer
LexicalSemanticsLayer getHyponymyLayer()
Gets hyponymy layer of this TextCorpus.- Returns:
- layer containing hyponyms of
Lemmaobjects fromLemmasLayer.
-
createHyponymyLayer
LexicalSemanticsLayer createHyponymyLayer()
Creates empty hyponymy layer in this TextCorpus.- Returns:
- annotation layer that has been created.
-
getHyperonymyLayer
LexicalSemanticsLayer getHyperonymyLayer()
Gets hyperonymy layer of this TextCorpus.- Returns:
- layer containing hyperonyms of
Lemmaobjects fromLemmasLayer.
-
createHyperonymyLayer
LexicalSemanticsLayer createHyperonymyLayer()
Creates empty hyperonymy layer in this TextCorpus.- Returns:
- annotation layer that has been created.
-
getDiscourseConnectivesLayer
DiscourseConnectivesLayer getDiscourseConnectivesLayer()
Gets discourse connectives layer of this TextCorpus.- Returns:
- layer containing discourse connectives annotations on
Tokenobjects fromTokensLayer.
-
createDiscourseConnectivesLayer
DiscourseConnectivesLayer createDiscourseConnectivesLayer()
Creates emptyDiscourseConnectivesLayerin this TextCorpus.- Returns:
- annotation layer that has been created.
-
createDiscourseConnectivesLayer
DiscourseConnectivesLayer createDiscourseConnectivesLayer(String typeTagset)
Creates emptyDiscourseConnectivesLayerin this TextCorpus.- Parameters:
typeTagset- tagset used to label semantic types of the connectives- Returns:
- annotation layer that has been created.
-
getWordSensesLayer
WordSensesLayer getWordSensesLayer()
Gets word senses layer of this TextCorpus.- Returns:
- layer containing word sense annotations on
Tokenobjects fromTokensLayer.
-
createWordSensesLayer
WordSensesLayer createWordSensesLayer(String source)
Creates emptyWordSensesLayerin this TextCorpus.- Parameters:
source- from where the word senses are taken- Returns:
- annotation layer that has been created.
-
getTextSourceLayer
TextSourceLayer getTextSourceLayer()
Gets textSource layer of this TextSource.- Returns:
- annotation layer containing text.
-
createTextSourceLayer
TextSourceLayer createTextSourceLayer()
Creates emptyTextSourceLayerin this TextCorpus.- Returns:
- annotation layer that has been created.
-
-