object DistADUtil
This class gathers all the utilities needed for distributed anomaly detection
- Alphabetic
- By Inheritance
- DistADUtil
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- val LOG: Logger
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
calculateBiSectingKmeanClustering(data: DataFrame, numberOfClusters: Int): DataFrame
Run BiSectingKMean clustering on a given Dataframe
Run BiSectingKMean clustering on a given Dataframe
- data
the given RDD[Triple]
- numberOfClusters
number of clusters
- returns
a dataframe containing cluster id for each data point
-
def
calculateBiSectingKmeanClustering(data: RDD[Triple], numberOfClusters: Int): DataFrame
Run BiSectingKMean clustering on a given RDD[Triple]
Run BiSectingKMean clustering on a given RDD[Triple]
- data
the given RDD[Triple]
- numberOfClusters
number of clusters
- returns
a dataframe containing cluster id for each data point
- def calculateMinHashLSHClustering(partialDataRDD: RDD[Triple], originalData: RDD[Triple], config: DistADConfig): DataFrame
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @IntrinsicCandidate()
-
val
convertStringToDouble: UserDefinedFunction
A UDF for converting numeric strings to double
-
def
createDF(data: RDD[Triple]): DataFrame
Gets an RDD[Triple] and converts it to a dataframe
Gets an RDD[Triple] and converts it to a dataframe
- data
the given RDD[Triple]
- returns
a dataframe containing s,p,o
-
def
createDFWithConversion(data: RDD[Triple]): DataFrame
Gets an RDD[Triple] and converts it to a dataframe with converting numeric strings to double
Gets an RDD[Triple] and converts it to a dataframe with converting numeric strings to double
- data
the given RDD[Triple]
- returns
a dataframe containing s,p,o
-
def
createSpark(): SparkSession
Creates an Spark session and returns it
-
def
detectNumberOfClusters(data: DataFrame, percentage: Double): Int
A function which sample the data and run clustering with different K.
A function which sample the data and run clustering with different K. At the end select the K with highest Silhouette value
- data
the given dataframe for clustering
- percentage
the percentage for sampling
- returns
the optimal K for clustering
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def filterAllTriplesWhichAtLeastHaveOneNumericLiterals(originalDataRDD: RDD[Triple], onlyLiteralDataRDD: RDD[Triple]): RDD[Triple]
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @IntrinsicCandidate()
-
def
getLocalName(x: Node): String
Gets a @link{Node} and returns the local name
Gets a @link{Node} and returns the local name
- x
a given @link{Node}
- returns
the local name
-
def
getNumber(a: String): Double
Gets a literal string value and extract the number from it
Gets a literal string value and extract the number from it
- a
the literal String value
- returns
the number form the literal string, "0" O.W
-
def
getOnlyLiteralObjects(nTriplesRDD: RDD[Triple]): RDD[Triple]
Gets a RDD[Triple] and filter only literals
Gets a RDD[Triple] and filter only literals
- nTriplesRDD
the given RDD[Triple]
- returns
a new RDD[Triple] containing only literals
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @IntrinsicCandidate()
-
def
iqr(data: DataFrame, verbose: Boolean, anomalyListSize: Int): DataFrame
Anomaly Detection method based on Interquartile Range
Anomaly Detection method based on Interquartile Range
- data
a given dataframe
- verbose
to show more internal outputs
- anomalyListSize
the min value list size for considering a list for anomaly detection process
- returns
a dataframe
-
def
isAllDigits(x: String): Boolean
Checks if the given string contains only digits
Checks if the given string contains only digits
- x
the string value
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isNumeric(x: String): Boolean
Gets a literal string and decide if the literal string is a numeric literal or not
Gets a literal string and decide if the literal string is a numeric literal or not
- x
The literal String
-
def
mad(data: DataFrame, verbose: Boolean, anomalyListSize: Int): DataFrame
Anomaly Detection method based on Mean Absolute Deviation (MAD)
Anomaly Detection method based on Mean Absolute Deviation (MAD)
- data
a given dataframe
- verbose
to show more internal outputs
- anomalyListSize
the min value list size for considering a list for anomaly detection process
- returns
a dataframe
- def merge[A, B](input: List[Map[A, B]]): Map[A, List[B]]
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @IntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @IntrinsicCandidate()
- val objList: List[String]
- def propClustering(triplesWithNumericLiteral: RDD[Triple]): RDD[(String, Set[(String, String, Double)])]
- def propWithSubject(a: RDD[Triple]): RDD[(String, String)]
-
def
readData(spark: SparkSession, input: String): RDD[Triple]
Based on the input file extension, reads the file into memory in a distributed manner
Based on the input file extension, reads the file into memory in a distributed manner
- spark
the Spark session
- input
the path of the input file
- returns
RDD[Triple]
-
def
search(a: Double, b: Array[Double]): Boolean
Gets a number and a list and checks if the list contains the number
Gets a number and a list and checks if the list contains the number
- a
the given number
- b
the given list
- returns
@code{True} if the number is in the list, @code{false} O.W
-
def
searchEdge(x: String, y: List[String]): Boolean
Gets an String and list of Strings, decide if the list contains the given String
Gets an String and list of Strings, decide if the list contains the given String
- x
the given string
- y
list of strings
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
triplesWithNumericLit(objLit: RDD[Triple]): RDD[Triple]
Gets a RDD[Triple] and filter only numeric literals
Gets a RDD[Triple] and filter only numeric literals
- objLit
the given RDD[Triple]
- returns
a new RDD[Triple] containing only numeric literals
-
def
triplesWithNumericLitWithTypeIgnoreEndingWithID(data: RDD[Triple]): RDD[Triple]
Gets a RDD[Triple] and filter only numeric literals based on the data types.
Gets a RDD[Triple] and filter only numeric literals based on the data types. It also ignores all the predicates which ends with "ID":
- data
the given RDD[Triple]
- returns
a new RDD[Triple] containing only numeric literals
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
def
writeAnomaliesToFile(data: List[String], path: String): Unit
Writes list[String] to a file with given path.
Writes list[String] to a file with given path. It handles HDFS and the normal file system
- data
the data that should be written to a file
- path
the path of the output file
-
def
writeToFile(path: String, data: DataFrame): Unit
Writes a dataframe to a file with given path.
Writes a dataframe to a file with given path. It handles HDFS and the normal file system
- path
the path of the output file
- data
the dataframe that should be written to a file
-
def
zscore(data: DataFrame, verbose: Boolean, anomalyListSize: Int): DataFrame
Anomaly Detection method based on Z-Score
Anomaly Detection method based on Z-Score
- data
a given dataframe
- verbose
to show more internal outputs
- anomalyListSize
the min value list size for considering a list for anomaly detection process
- returns
a dataframe
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated