Class ClusterResult


  • public class ClusterResult
    extends Object
    Since:
    3.0.0
    Author:
    Kilian
    • Constructor Detail

      • ClusterResult

        public ClusterResult​(int[] clusterIndex,
                             Hash[] hashes)
    • Method Detail

      • printInformation

        public void printInformation​(boolean includeSilhouetteCoefficient)
      • getCluster

        public List<Hash> getCluster​(int cluster)
      • getClusterData

        public int[] getClusterData()
      • getSumSquaredError

        public double getSumSquaredError​(int cluster)
      • getSumSquaredError

        public double getSumSquaredError()
      • getSilhouetteCoef

        public double getSilhouetteCoef​(int cluster)
      • getBestFitCluster

        public int getBestFitCluster​(Hash testHash)
        Return the cluster index whose centeroid is most similar to the supplied hash
        Parameters:
        testHash - the hash to check against the clusters
        Returns:
        the category (index) of the best matching cluster
      • lookupClusterIdForKnownHash

        public int lookupClusterIdForKnownHash​(Hash testHash)
        Return the cluster index for a hash which was used during the clustering.

        This method will return the same value as getBestFitCluster(Hash) but is quicker.

        Parameters:
        testHash - The hash to check the cluster id for.
        Returns:
        the index of the cluster this hash belongs to
        Throws:
        NullPointerException - if the hash wasn't used during clustering.
      • clusterIndexToDataIndex

        public List<Integer> clusterIndexToDataIndex​(int clusterIndex)
      • indexToCluster

        public int indexToCluster​(int index)
        Return the cluster index for the data point of the given index
        Parameters:
        index - of the datapoint when the cluster method was called
        Returns:
        the cluster index
      • getCenteroid

        public FuzzyHash getCenteroid​(int cluster)
        Get the fuzzy hash representing the specified cluster
        Parameters:
        cluster - the cluster index
        Returns:
        the centeroid of the cluster
      • getPotentialFits

        public Map<Integer,​Double> getPotentialFits​(Hash testHash,
                                                          double sigma)
        Checks which cluster has a high chance to contain the closest neighbor to this hash
        Parameters:
        testHash - the hash to check which cluster it belongs to
        sigma - a stretch factor indicating how much error from a cluster center to the hash is allowed based on the range of the distances within the cluster. With the original dataset a sigma of 1 would include the best fit cluster with a 100% certainty.
        Returns:
        the cluster indices a match is most likely