Class ClusterResult
- java.lang.Object
-
- dev.brachtendorf.jimagehash.datastructures.ClusterResult
-
public class ClusterResult extends Object
- Since:
- 3.0.0
- Author:
- Kilian
-
-
Field Summary
Fields Modifier and Type Field Description protected int[]clusterIndexKeep track to which cluster a certain points belongsprotected HashMap<Integer,FuzzyHash>clustersprotected HashMap<Integer,List<Integer>>entriesInClusterKey the cluster index, value the data indexprotected HashMap<Hash,Integer>entryToDataIndexprotected intnumberOfClustersprotected HashMap<Integer,DoubleSummaryStatistics>statsCompute the min max average and mean of each cluster
-
Constructor Summary
Constructors Constructor Description ClusterResult(int[] clusterIndex, Hash[] hashes)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description List<Integer>clusterIndexToDataIndex(int clusterIndex)intgetBestFitCluster(Hash testHash)Return the cluster index whose centeroid is most similar to the supplied hashFuzzyHashgetCenteroid(int cluster)Get the fuzzy hash representing the specified clusterList<Hash>getCluster(int cluster)int[]getClusterData()Map<Integer,List<Hash>>getClusters()Map<Integer,Double>getPotentialFits(Hash testHash, double sigma)Checks which cluster has a high chance to contain the closest neighbor to this hashdoublegetSilhouetteCoef(int cluster)DoubleSummaryStatisticsgetStats(int cluster)doublegetSumSquaredError()doublegetSumSquaredError(int cluster)intindexToCluster(int index)Return the cluster index for the data point of the given indexintlookupClusterIdForKnownHash(Hash testHash)Return the cluster index for a hash which was used during the clustering.voidprintInformation(boolean includeSilhouetteCoefficient)
-
-
-
Field Detail
-
numberOfClusters
protected int numberOfClusters
-
clusterIndex
protected int[] clusterIndex
Keep track to which cluster a certain points belongs
-
stats
protected HashMap<Integer,DoubleSummaryStatistics> stats
Compute the min max average and mean of each cluster
-
entriesInCluster
protected HashMap<Integer,List<Integer>> entriesInCluster
Key the cluster index, value the data index
-
-
Constructor Detail
-
ClusterResult
public ClusterResult(int[] clusterIndex, Hash[] hashes)
-
-
Method Detail
-
printInformation
public void printInformation(boolean includeSilhouetteCoefficient)
-
getStats
public DoubleSummaryStatistics getStats(int cluster)
-
getClusterData
public int[] getClusterData()
-
getSumSquaredError
public double getSumSquaredError(int cluster)
-
getSumSquaredError
public double getSumSquaredError()
-
getSilhouetteCoef
public double getSilhouetteCoef(int cluster)
-
getBestFitCluster
public int getBestFitCluster(Hash testHash)
Return the cluster index whose centeroid is most similar to the supplied hash- Parameters:
testHash- the hash to check against the clusters- Returns:
- the category (index) of the best matching cluster
-
lookupClusterIdForKnownHash
public int lookupClusterIdForKnownHash(Hash testHash)
Return the cluster index for a hash which was used during the clustering.This method will return the same value as
getBestFitCluster(Hash)but is quicker.- Parameters:
testHash- The hash to check the cluster id for.- Returns:
- the index of the cluster this hash belongs to
- Throws:
NullPointerException- if the hash wasn't used during clustering.
-
indexToCluster
public int indexToCluster(int index)
Return the cluster index for the data point of the given index- Parameters:
index- of the datapoint when the cluster method was called- Returns:
- the cluster index
-
getCenteroid
public FuzzyHash getCenteroid(int cluster)
Get the fuzzy hash representing the specified cluster- Parameters:
cluster- the cluster index- Returns:
- the centeroid of the cluster
-
getPotentialFits
public Map<Integer,Double> getPotentialFits(Hash testHash, double sigma)
Checks which cluster has a high chance to contain the closest neighbor to this hash- Parameters:
testHash- the hash to check which cluster it belongs tosigma- a stretch factor indicating how much error from a cluster center to the hash is allowed based on the range of the distances within the cluster. With the original dataset a sigma of 1 would include the best fit cluster with a 100% certainty.- Returns:
- the cluster indices a match is most likely
-
-