class MinHashModel extends GenericSimilarityEstimatorModel
- Alphabetic
- By Inheritance
- MinHashModel
- GenericSimilarityEstimatorModel
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new MinHashModel()
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
var
_featuresColumnNameDfA: String
- Attributes
- protected
- Definition Classes
- GenericSimilarityEstimatorModel
-
var
_featuresColumnNameDfB: String
- Attributes
- protected
- Definition Classes
- GenericSimilarityEstimatorModel
-
var
_inputCol: String
- Attributes
- protected
- Definition Classes
- GenericSimilarityEstimatorModel
-
var
_similarityEstimationColumnName: String
- Attributes
- protected
- Definition Classes
- GenericSimilarityEstimatorModel
-
var
_uriColumnNameDfA: String
- Attributes
- protected
- Definition Classes
- GenericSimilarityEstimatorModel
-
var
_uriColumnNameDfB: String
- Attributes
- protected
- Definition Classes
- GenericSimilarityEstimatorModel
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
checkColumnNames(dfA: DataFrame, dfB: DataFrame): Unit
- Attributes
- protected
- Definition Classes
- GenericSimilarityEstimatorModel
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
def
createCrossJoinDF(dfA: DataFrame, dfB: DataFrame): DataFrame
This method creates a cross join dataframe with all possible pairs in the dataframe which can be compared
This method creates a cross join dataframe with all possible pairs in the dataframe which can be compared
- dfA
is the first dataFrame which has to have column for URI and for the Vector based feature vector
- dfB
is second dataframe
- returns
it return a dataframe with four columns two for the uri columns and two for the feature vector. the size is approx the product of both dataframe sizes
- Attributes
- protected
- Definition Classes
- GenericSimilarityEstimatorModel
-
def
createNnDF(df: DataFrame, key: Vector, keyUri: String = "unknown"): DataFrame
This method creates a dataframe aligned to createCrossJoinDF result.
This method creates a dataframe aligned to createCrossJoinDF result. but in this case we only assign to each
- df
the dataframe with all entities we want to compare against with all uri feature vector representations
- key
the vector representation of uri features
- keyUri
you can specify the key uri name
- returns
dataframe with four columns with all desired pairs to compare. uriA, uriB, featuresA, featuresB
- Attributes
- protected
- Definition Classes
- GenericSimilarityEstimatorModel
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
val
estimatorMeasureType: String
- Definition Classes
- MinHashModel → GenericSimilarityEstimatorModel
-
val
estimatorName: String
- Definition Classes
- MinHashModel → GenericSimilarityEstimatorModel
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
val
modelType: String
- Definition Classes
- GenericSimilarityEstimatorModel
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
nearestNeighbors(dfA: DataFrame, key: Vector, k: Int, keyUri: String = "unknown", valueColumn: String = "distCol", keepKeyUriColumn: Boolean = false): DataFrame
- dfA
one dataframe formatted with two columns one for the URI as String and one for feature vector as indexed feature vector representation (like from Spark MLlib Count Vectorizer)
- key
key vector representation like output from Count Vectorizer from MLlib
- k
number of nearest neighbors we want to return
- keyUri
name of the uri which is later listed in the resulting dataframe
- valueColumn
name of the column of resulting similarities and distances
- keepKeyUriColumn
boolean value to decide if the key uri should be presented in a column or not
- returns
resulting dataframe of similar uris with distances/similarities to the given key
- Definition Classes
- MinHashModel → GenericSimilarityEstimatorModel
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
reduceJoinDf(simDf: DataFrame, threshold: Double): DataFrame
This method reduces the overall created datafraame by a set threshold
This method reduces the overall created datafraame by a set threshold
All values which are less good than the threshold will not taken into account
- simDf
the resulting dataframe we want to reduce
- threshold
the threshold which is taken for reduction as upper bound for distance and lower bound for similarity
- Attributes
- protected
- Definition Classes
- GenericSimilarityEstimatorModel
-
def
reduceNnDf(simDf: DataFrame, k: Int, keepKeyUriColumn: Boolean): DataFrame
Limits the number of presented nearest neighbors and orders the results
Limits the number of presented nearest neighbors and orders the results
- simDf
the similarity evaluated data frame
- k
the number of nearest neighbors we search for
- keepKeyUriColumn
if we want to keep the column of the fixed uri. so decision between a resulting dataframe of two or three columns
- returns
return dataframe of k nearest neighbor uris in one column, in another column the estimated similarity and in a third column the uri we compared against (third column is optional)
- Attributes
- protected
- Definition Classes
- GenericSimilarityEstimatorModel
-
def
setFeaturesColumnNameDfA(features_column_name: String): MinHashModel.this.type
- Definition Classes
- GenericSimilarityEstimatorModel
-
def
setFeaturesColumnNameDfB(features_column_name: String): MinHashModel.this.type
- Definition Classes
- GenericSimilarityEstimatorModel
-
def
setInputCol(inputCol: String): MinHashModel.this.type
- Definition Classes
- GenericSimilarityEstimatorModel
- def setNumHashTables(n: Int): MinHashModel.this.type
-
def
setSimilarityEstimationColumnName(valueColumnName: String): Unit
This method sets the column name for the resulting dataframe similarity or distance: e.g.
This method sets the column name for the resulting dataframe similarity or distance: e.g. "jaccardSimilarity"
- valueColumnName
the name of the resulting distance/similarity value
- Attributes
- protected
- Definition Classes
- GenericSimilarityEstimatorModel
-
def
setUriColumnNameDfA(uri_column_name: String): MinHashModel.this.type
- Definition Classes
- GenericSimilarityEstimatorModel
-
def
setUriColumnNameDfB(uri_column_name: String): MinHashModel.this.type
- Definition Classes
- GenericSimilarityEstimatorModel
-
val
similarityEstimation: UserDefinedFunction
- Attributes
- protected
- Definition Classes
- GenericSimilarityEstimatorModel
-
def
similarityJoin(dfA: DataFrame, dfB: DataFrame, threshold: Double = -1.0, valueColumn: String = "distCol"): DataFrame
This method creates a dataframe which propses for each pair of URI the assigned similarity/distance
This method creates a dataframe which propses for each pair of URI the assigned similarity/distance
- dfA
one dataframe formatted with two columns one for the URI as String and one for feature vector as indexed feature vector representation (like from Spark MLlib Count Vectorizer)
- dfB
second dataframe to compare entries from dataframe dfA. formetting needed as in dfA
- threshold
threshold for minimal distance or similarity final dataframe is filtered for
- valueColumn
column name of the resulting similarity/distance column name
- returns
dataframe with the columns for uris and the assigned similarity column
- Definition Classes
- MinHashModel → GenericSimilarityEstimatorModel
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated @deprecated
- Deprecated
(Since version ) see corresponding Javadoc for more information.