class DaSimEstimator extends AnyRef
- Alphabetic
- By Inheritance
- DaSimEstimator
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new DaSimEstimator()
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- var _calcAvailability: Boolean
- var _pDistSimFeatureExtractionMethod: String
- var _pDistSimThreshold: Double
- var _pInitialFilterByObject: String
- var _pInitialFilterByPredicate: String
- var _pInitialFilterBySPARQL: String
- var _parameterVerboseProcess: Boolean
- var _seedLimit: Int
- var _sem_availability: Broadcast[Map[String, Double]]
- var _sem_distSimFeatureExtractionMethod: Broadcast[String]
- var _sem_entityCols: Broadcast[Array[String]]
- var _sem_featureExtractionMethod: Broadcast[String]
- var _sem_finalValCol: Broadcast[String]
- var _sem_importance: Broadcast[Map[String, Double]]
- var _sem_initialFilter: Broadcast[String]
- var _sem_reliability: Broadcast[Map[String, Double]]
- var _sem_similarityCols: Broadcast[Array[String]]
-
def
aggregateSimilarityScore(simDf: DataFrame, valueStreching: Boolean = true, availability: Map[String, Double] = null, importance: Map[String, Double] = null, reliability: Map[String, Double] = null): DataFrame
aggregate similarity scores and weight those
aggregate similarity scores and weight those
- simDf
similarity dataframw with the feature specific sim scores
- valueStreching
parameter, optional to strech features, by deafault set
- availability
weightning by availability
- importance
user specific weighning over importance
- reliability
optional opportunity to incluence weighning by reliability
- returns
similarity dataframe with aggregated and weigthed final similarity score
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
- def calculateAvailability(extractedFeaturesDF: DataFrame): Map[String, Double]
-
def
calculateDaSimSimilarities(candidatePairsDataFrame: DataFrame, extractedFeatureDataframe: DataFrame): DataFrame
calculate with the new approach the weighted and feature specific simialrity scores
calculate with the new approach the weighted and feature specific simialrity scores
- candidatePairsDataFrame
candidate pairs which span up the combinations to be calculated on
- extractedFeatureDataframe
extracted feature dataframe
- returns
calculate for each feature the pairwise similarity score
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @IntrinsicCandidate()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
gatherCandidatePairs(dataset: Dataset[Triple], seeds: DataFrame, _pDistSimFeatureExtractionMethod: String = "os", fastNotDistSim: Boolean = true, _pDistSimThreshold: Double = 0): DataFrame
we use distsim to gather promising candidates
we use distsim to gather promising candidates
- dataset
prefiltered KG for gathering candidates
- seeds
the seeds to be used for calculating promising cadidates via DistSim
- _pDistSimFeatureExtractionMethod
method for distsim feature extractor
- _pDistSimThreshold
threshold for distsim postfilter pairs by min threshold
- returns
dataframe with candidate pairs resulting from DistSim
-
def
gatherFeatures(ds: Dataset[Triple], candidates: DataFrame, sparqlFeatureExtractionQuery: String = null, predicateFilter: String = "", objectFilter: String = ""): DataFrame
feature extraction for extensive similarity scores creates dataframe with all features two options for feature gathering either SparqlFrame or SmartFeature Extractor which operates pivot based
feature extraction for extensive similarity scores creates dataframe with all features two options for feature gathering either SparqlFrame or SmartFeature Extractor which operates pivot based
- ds
dataset of KG
- candidates
dandidate pairs from distsim
- sparqlFeatureExtractionQuery
optional, but if set we use sparql frame and not smartfeatureextractor
- returns
dataframe with columns corresponding to the features and the uri identifier
-
def
gatherSeeds(ds: Dataset[Triple], sparqlFilter: String = null, objectFilter: String = null, predicateFilter: String = null): DataFrame
internal method that collects seeds by either sparql or object filter
internal method that collects seeds by either sparql or object filter
- ds
dataset of triples representing input kg
- sparqlFilter
filter by sparql initial kg
- objectFilter
gilter init kg by spo object
- returns
dataframe with one column containing string representation of seed URIs
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @IntrinsicCandidate()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @IntrinsicCandidate()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
listDistinctCandidates(candidatePairs: DataFrame): DataFrame
list all elements which exists within the resulting uris of distsim
list all elements which exists within the resulting uris of distsim
- candidatePairs
candidate pairs in a dataframe coming from distsim
- returns
dataframw ith one column having the relevant uris as strings
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
normSimColumns(df: DataFrame): DataFrame
optional method to normalize similarity columns
optional method to normalize similarity columns
- df
similarity scored dataframe which needs to be normalized
- returns
normalized dataframe
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @IntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @IntrinsicCandidate()
- var pAvailability: Map[String, Double]
- var pImportance: Map[String, Double]
- var pReliability: Map[String, Double]
- var pSimilarityCalculationExecutionOrder: Array[String]
- var pSparqlFeatureExtractionQuery: Null
- var pValueStreching: Boolean
- def semantification(resultDf: DataFrame): RDD[Triple]
-
def
setAvailability(availability: Map[String, Double]): DaSimEstimator.this.type
specify manually the availability of each feature this parameter weights the relevance of a certain feature similarity based on their availability it is possible that the availability is known if the value is not given, it will be considered to be equally distributed
specify manually the availability of each feature this parameter weights the relevance of a certain feature similarity based on their availability it is possible that the availability is known if the value is not given, it will be considered to be equally distributed
- returns
adjusted transformer
-
def
setDistSimFeatureExtractionMethod(distSimFeatureExtractionMethod: String): DaSimEstimator.this.type
DistSim feature extraction method feature extracting method for first guesses via DistSim
DistSim feature extraction method feature extracting method for first guesses via DistSim
- distSimFeatureExtractionMethod
DistSim feature Extraction Method
- returns
adjusted transformer
-
def
setDistSimThreshold(distSimThreshold: Double): DaSimEstimator.this.type
DistSim Threshold min Similarity This is the threshold for minimal similarity score being used within Distsim for promising canidates
DistSim Threshold min Similarity This is the threshold for minimal similarity score being used within Distsim for promising canidates
- distSimThreshold
DistSim threshold min similarity score for prefilter candidate pairs
- returns
adjusted transformer
-
def
setImportance(importance: Map[String, Double]): DaSimEstimator.this.type
specify manually the importance of each feature this parameter weights the relevance of a certain feature similarity based on their importance this value offers user to influence weightning on personal preferance
specify manually the importance of each feature this parameter weights the relevance of a certain feature similarity based on their importance this value offers user to influence weightning on personal preferance
- returns
adjusted transformer
- def setLimitSeeds(seedLimit: Int): DaSimEstimator.this.type
-
def
setObjectFilter(objectFilter: String): DaSimEstimator.this.type
FIlter init KG by spo object Filter the KG by the object of spo structure, so an alternative and faster compared to sparql
FIlter init KG by spo object Filter the KG by the object of spo structure, so an alternative and faster compared to sparql
- objectFilter
string representing the object for spo filter
- returns
adjusted transformer
-
def
setPredicateFilter(predicateFilter: String): DaSimEstimator.this.type
FIlter init KG by spo object Filter the KG by the predicate of spo structure, so an alternative and faster compared to sparql
FIlter init KG by spo object Filter the KG by the predicate of spo structure, so an alternative and faster compared to sparql
- predicateFilter
string representing the object for spo filter
- returns
adjusted transformer
-
def
setReliability(reliability: Map[String, Double]): DaSimEstimator.this.type
specify manually the reliability of each feature this parameter weights the relevance of a certain feature similarity based on their reliability it is possible that the reliability is known, for example that certain data might be influenced by ffake news or that data is rarely updated if the value is not given, it will be considered to be equally distributed
specify manually the reliability of each feature this parameter weights the relevance of a certain feature similarity based on their reliability it is possible that the reliability is known, for example that certain data might be influenced by ffake news or that data is rarely updated if the value is not given, it will be considered to be equally distributed
- returns
adjusted transformer
-
def
setSimilarityCalculationExecutionOrder(similarityCalculationExecutionOrder: Array[String]): DaSimEstimator.this.type
Execution order of similarity scores here you can specify in which order the similarity values should be executed
Execution order of similarity scores here you can specify in which order the similarity values should be executed
- returns
adjusted transformer
-
def
setSimilarityValueStreching(valueStreching: Boolean): DaSimEstimator.this.type
Normalize similairty scores per feature this parameter offers that the feature dedicated similarity scores are streched/normed s.t.
Normalize similairty scores per feature this parameter offers that the feature dedicated similarity scores are streched/normed s.t. they all reach from zero to one
- returns
adjusted transformer
-
def
setSparqlFilter(sparqlFilter: String): DaSimEstimator.this.type
candidate filtering sparql with this parameter you can reduce the list of of candidates by use use of a sparql query
candidate filtering sparql with this parameter you can reduce the list of of candidates by use use of a sparql query
- sparqlFilter
SPARQL filter applied ontop of input KG
- returns
adjusted transformer
- def setVerbose(verbose: Boolean): DaSimEstimator.this.type
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
transform(dataset: Dataset[Triple]): DataFrame
transforms da kg to a similarity score dataframe based on parameters overall method encapsulating the methods and should be used from outside
transforms da kg to a similarity score dataframe based on parameters overall method encapsulating the methods and should be used from outside
- dataset
knowledge graph
- returns
dataframw with results of similarity scores as metagraph
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated