Class CategoricalMatcher
- java.lang.Object
-
- dev.brachtendorf.jimagehash.matcher.PlainImageMatcher
-
- dev.brachtendorf.jimagehash.matcher.categorize.AbstractCategoricalMatcher
-
- dev.brachtendorf.jimagehash.matcher.categorize.CategoricalMatcher
-
- All Implemented Interfaces:
CategoricalImageMatcher
- Direct Known Subclasses:
WeightedCategoricalMatcher
public class CategoricalMatcher extends AbstractCategoricalMatcher
Cluster images into common categories. This matcher clusters images by computing the distance to the closest cluster and adds an image if it is within a given distance. This method works only approximaterecomputeCategories()has to be called after images have been added.Cluster centeroids are represented as FuzzyHashes a prot hash represented as mode hash of all added images. ImplNote: TODO the weighted categorical matcher employs different techniques to speed up cluster recomputation because it was so slow. While this class is usually faster there is no reason to not port the improvements over to this class as well.
- Since:
- 3.0.0
- Author:
- Kilian
-
-
Field Summary
Fields Modifier and Type Field Description protected Map<String,Map<HashingAlgorithm,Hash>>cachedHashesHashes of the added imagesprotected TreeSet<Integer>categoriesCluster id's currently usde by the matcherprotected Set<Integer>categoriesAlteredKeep track which categories were changed during the last iteration.protected Map<HashingAlgorithm,Map<Integer,FuzzyHash>>clusterHashThe cluster centeroid of a given hashing algorithm and category.protected Map<Integer,DoubleSummaryStatistics>clusterQualityInternal cluster distancesprotected booleanclusterRecomputedWere the categories updated or are they dirtyprotected Map<HashingAlgorithm,Map<FuzzyHash,Integer>>clusterReverseLookupQuick lookup the category of a fuzzy cluster hashprotected doublenewCategoryThresholdThe distance an image max have to be considered in an own clusterprotected Map<Integer,CategoricalImageMatcher>subCategoryMatcherChained matchers which sub categorize computed clusters even further.-
Fields inherited from class dev.brachtendorf.jimagehash.matcher.categorize.AbstractCategoricalMatcher
cachedImagesInCategory, reverseImageCategoryMap
-
Fields inherited from class dev.brachtendorf.jimagehash.matcher.PlainImageMatcher
steps
-
-
Constructor Summary
Constructors Constructor Description CategoricalMatcher(double newAdditionThreshold)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected doubleaddCategoricalImage(Hash[] hashes, int category, String uniqueId)doubleaddCategoricalImage(LabeledImage labeledImage)doubleaddCategoricalImage(BufferedImage bi, int category, String uniqueId)voidaddCategoricalImages(LabeledImage... images)voidaddCategoricalImages(Collection<LabeledImage> images)The name of the labeled image serves as unique identifierbooleanaddHashingAlgorithm(HashingAlgorithm algo)Append a new hashing algorithm to be used by this matcher.BufferedImagecategoricalHashToImage(HashingAlgorithm hashAlgorithm, int category, int blockSize)CategorizationResultcategorizeImage(BufferedImage bi)Compute the category of the supplied image.protected CategorizationResultcategorizeImage(String uniqueId, Hash[] hashes, Set<Integer> categoriesAltered)Categorize an image on a subset of all categories with hashes present.protected CategorizationResultcategorizeImage(String uniqueId, BufferedImage bi)Compute the category of the supplied image.CategorizationResultcategorizeImageAndAdd(BufferedImage bi, String uniqueId)Compute the closest category of an image and afterwards add it to the internal categorization queue.protected voidcleanupEmptyCategories()protected voidclusterPostcomputation()Method invoked afterrecomputeClusters(int)is being called.protected voidclusterPrecomputation()Method invoked beforerecomputeClusters(int)is being called.protected doublecomputeDistanceForCategory(Hash[] hashes, int category, double bestDistance)Compute the distance between an image and a category cluster midpoint.protected doublecomputeDistanceToCluster(FuzzyHash cluster, Hash imageHash)doublegetAverageDistanceWithinCluster(int category)List<Integer>getCategories()Get a list of available categories this matcher matched images to.LinkedHashMap<Integer,Integer>getCategoriesSortedByImageCount()Return the categories sorted by the number of images mapped to the category.protected intgetCategory(int iter, String uniqueId, Hash[] hashes, Set<Integer> categoriesAltered)FuzzyHashgetClusterAverageHash(HashingAlgorithm algorithm, int category)Get the average hash representing the midpoint of the category cluster.voidprintClusterInfo(int minImagesInCluster)voidrecomputeCategories()Recompute the category definition of this clustering matcher and it's nested matchers.protected booleanrecomputeClusters(int maxIterations)protected voidresetBitWeights()Reset the clusteroids hash weights while keeping the mode hash intactprotected voidupdateCategories(Map<String,dev.brachtendorf.datastructures.Pair<Integer,Hash[]>> newImageCategoryMap)-
Methods inherited from class dev.brachtendorf.jimagehash.matcher.categorize.AbstractCategoricalMatcher
getCategory, getImageCountInCategory, getImagesInCategory, isCategorized
-
Methods inherited from class dev.brachtendorf.jimagehash.matcher.PlainImageMatcher
clearHashingAlgorithms, equals, getAlgorithms, hashCode, removeHashingAlgorithm
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface dev.brachtendorf.jimagehash.matcher.categorize.CategoricalImageMatcher
categorizeImage, categorizeImageAndAdd
-
-
-
-
Field Detail
-
clusterHash
protected Map<HashingAlgorithm,Map<Integer,FuzzyHash>> clusterHash
The cluster centeroid of a given hashing algorithm and category.
-
clusterReverseLookup
protected Map<HashingAlgorithm,Map<FuzzyHash,Integer>> clusterReverseLookup
Quick lookup the category of a fuzzy cluster hash
-
cachedHashes
protected Map<String,Map<HashingAlgorithm,Hash>> cachedHashes
Hashes of the added images
-
clusterQuality
protected Map<Integer,DoubleSummaryStatistics> clusterQuality
Internal cluster distances
-
clusterRecomputed
protected boolean clusterRecomputed
Were the categories updated or are they dirty
-
subCategoryMatcher
protected Map<Integer,CategoricalImageMatcher> subCategoryMatcher
Chained matchers which sub categorize computed clusters even further. TODO currently not implemented as cluster re evaluation requires access to the base image which we do not have!
-
newCategoryThreshold
protected double newCategoryThreshold
The distance an image max have to be considered in an own cluster
-
-
Method Detail
-
recomputeCategories
public void recomputeCategories()
Description copied from interface:CategoricalImageMatcherRecompute the category definition of this clustering matcher and it's nested matchers.Recomputing categories will take recently added images into account and update image/category affiliation if necessary. This operation needs to be called manually due to the potential high cost of this method call.
Unless otherwise noted the matcher makes no guarantee that the image category does not change with this method execution.
-
recomputeClusters
protected boolean recomputeClusters(int maxIterations)
-
clusterPrecomputation
protected void clusterPrecomputation()
Method invoked beforerecomputeClusters(int)is being called.
-
clusterPostcomputation
protected void clusterPostcomputation()
Method invoked afterrecomputeClusters(int)is being called.
-
getCategory
protected int getCategory(int iter, String uniqueId, Hash[] hashes, Set<Integer> categoriesAltered)
-
updateCategories
protected void updateCategories(Map<String,dev.brachtendorf.datastructures.Pair<Integer,Hash[]>> newImageCategoryMap)
-
cleanupEmptyCategories
protected void cleanupEmptyCategories()
-
resetBitWeights
protected void resetBitWeights()
Reset the clusteroids hash weights while keeping the mode hash intact
-
addHashingAlgorithm
public boolean addHashingAlgorithm(HashingAlgorithm algo)
Description copied from class:PlainImageMatcherAppend a new hashing algorithm to be used by this matcher. The same algorithm may only be added once. Attempts to add the same algorithm twice is a NOP.For some matchers the order of added hashing algorithms is crucial. The order the hashes are added is preserved.
- Overrides:
addHashingAlgorithmin classPlainImageMatcher- Parameters:
algo- The algorithms to be added- Returns:
- true if the algorithm was added, false if it was already present
-
addCategoricalImages
public void addCategoricalImages(Collection<LabeledImage> images)
The name of the labeled image serves as unique identifier- Parameters:
images- the images to categories
-
addCategoricalImages
public void addCategoricalImages(LabeledImage... images)
-
addCategoricalImage
public double addCategoricalImage(LabeledImage labeledImage)
-
addCategoricalImage
public double addCategoricalImage(BufferedImage bi, int category, String uniqueId)
-
addCategoricalImage
protected double addCategoricalImage(Hash[] hashes, int category, String uniqueId)
-
computeDistanceToCluster
protected double computeDistanceToCluster(FuzzyHash cluster, Hash imageHash)
-
categorizeImage
public CategorizationResult categorizeImage(BufferedImage bi)
Compute the category of the supplied image. A category is a collection of similar images mapped to a common hash which minimizes the distance of all hashes mapped to this category.- Specified by:
categorizeImagein interfaceCategoricalImageMatcher- Overrides:
categorizeImagein classAbstractCategoricalMatcher- Parameters:
bi- The buffered image to categorize- Returns:
- a pair whose first value returns the category and second value returns a distance measure between the category and the supplied image. Smaller distances meaning a closer match
-
categorizeImage
protected CategorizationResult categorizeImage(String uniqueId, BufferedImage bi)
Compute the category of the supplied image. A category is a collection of similar images mapped to a common hash which minimizes the distance of all hashes mapped to this category.- Specified by:
categorizeImagein classAbstractCategoricalMatcher- Parameters:
uniqueId- The id used to reference the image in the future.bi- The buffered image to categorize- Returns:
- a pair whose first value returns the category and second value returns a distance measure between the category and the supplied image
-
categorizeImage
protected CategorizationResult categorizeImage(String uniqueId, Hash[] hashes, Set<Integer> categoriesAltered)
Categorize an image on a subset of all categories with hashes present. This method is used during recomputation of the clusters and cuts down the number of comparisons that have to be made. If an image is newly inserted the categoriesAltered variable should contain all available categories.- Parameters:
uniqueId- The id used to reference the image in the future.hashes- a hash of the image for each hashing algorithmcategoriesAltered- a list of the categories which got altered since last computation- Returns:
- the best category and distance to it. if no matching category can be found return -1 and Double.MaxValue
-
computeDistanceForCategory
protected double computeDistanceForCategory(Hash[] hashes, int category, double bestDistance)
Compute the distance between an image and a category cluster midpoint. This method is used to compute the minimum distance and therefore might cut the computation short if the distance is higher than the supplied best distance cutoff.- Parameters:
hashes- an array containing the hash for an image for each hashing algorithm added to this matchercategory- the category to compute the distance forbestDistance- the best distance found so far. May be used to- Returns:
- the distance between the image and the category midpoint or Double.MAX_VALUE if bestDistance was reached and copmutation was not finished
-
categorizeImageAndAdd
public CategorizationResult categorizeImageAndAdd(BufferedImage bi, String uniqueId)
Description copied from interface:CategoricalImageMatcherCompute the closest category of an image and afterwards add it to the internal categorization queue. Some matchers may choose to immediately update the current category to reflect the changes.The add action is implementation depended. Some categorizers may choose to directly incorporate the image and update it's category representation other algorithms may require a call to
CategoricalImageMatcher.recomputeCategories()before the addition takes effect.- Specified by:
categorizeImageAndAddin interfaceCategoricalImageMatcher- Specified by:
categorizeImageAndAddin classAbstractCategoricalMatcher- Parameters:
bi- the image to categorizeuniqueId- the unique id to reference the image by.- Returns:
- the currently closest cluster for this image.
-
getCategories
public List<Integer> getCategories()
Description copied from interface:CategoricalImageMatcherGet a list of available categories this matcher matched images to. Each category represents a set of images with high similarity.- Specified by:
getCategoriesin interfaceCategoricalImageMatcher- Overrides:
getCategoriesin classAbstractCategoricalMatcher- Returns:
- A list of id's
-
getCategoriesSortedByImageCount
public LinkedHashMap<Integer,Integer> getCategoriesSortedByImageCount()
Return the categories sorted by the number of images mapped to the category. An image is considered part of a category if it was either added by callingaddCategoricalImage(dev.brachtendorf.jimagehash.matcher.categorize.supervised.LabeledImage)orcategorizeImageAndAdd(java.awt.image.BufferedImage, java.lang.String).- Returns:
- a list containing the sorted categories
-
categoricalHashToImage
public BufferedImage categoricalHashToImage(HashingAlgorithm hashAlgorithm, int category, int blockSize)
-
printClusterInfo
public void printClusterInfo(int minImagesInCluster)
-
getAverageDistanceWithinCluster
public double getAverageDistanceWithinCluster(int category)
-
getClusterAverageHash
public FuzzyHash getClusterAverageHash(HashingAlgorithm algorithm, int category)
Get the average hash representing the midpoint of the category cluster.- Parameters:
algorithm- the algorithm to get the hash forcategory- the category- Returns:
- the average hash .
-
-