Class CategoricalMatcher

  • All Implemented Interfaces:
    CategoricalImageMatcher
    Direct Known Subclasses:
    WeightedCategoricalMatcher

    public class CategoricalMatcher
    extends AbstractCategoricalMatcher
    Cluster images into common categories. This matcher clusters images by computing the distance to the closest cluster and adds an image if it is within a given distance. This method works only approximate recomputeCategories() has to be called after images have been added.

    Cluster centeroids are represented as FuzzyHashes a prot hash represented as mode hash of all added images. ImplNote: TODO the weighted categorical matcher employs different techniques to speed up cluster recomputation because it was so slow. While this class is usually faster there is no reason to not port the improvements over to this class as well.

    Since:
    3.0.0
    Author:
    Kilian
    • Field Detail

      • clusterRecomputed

        protected boolean clusterRecomputed
        Were the categories updated or are they dirty
      • categories

        protected TreeSet<Integer> categories
        Cluster id's currently usde by the matcher
      • subCategoryMatcher

        protected Map<Integer,​CategoricalImageMatcher> subCategoryMatcher
        Chained matchers which sub categorize computed clusters even further. TODO currently not implemented as cluster re evaluation requires access to the base image which we do not have!
      • newCategoryThreshold

        protected double newCategoryThreshold
        The distance an image max have to be considered in an own cluster
      • categoriesAltered

        protected Set<Integer> categoriesAltered
        Keep track which categories were changed during the last iteration. It's only necessary to compute the distance to the new categories (as well as the currently one). This step reduces the search quite a bit
    • Constructor Detail

      • CategoricalMatcher

        public CategoricalMatcher​(double newAdditionThreshold)
    • Method Detail

      • recomputeCategories

        public void recomputeCategories()
        Description copied from interface: CategoricalImageMatcher
        Recompute the category definition of this clustering matcher and it's nested matchers.

        Recomputing categories will take recently added images into account and update image/category affiliation if necessary. This operation needs to be called manually due to the potential high cost of this method call.

        Unless otherwise noted the matcher makes no guarantee that the image category does not change with this method execution.

      • recomputeClusters

        protected boolean recomputeClusters​(int maxIterations)
      • clusterPrecomputation

        protected void clusterPrecomputation()
        Method invoked before recomputeClusters(int) is being called.
      • clusterPostcomputation

        protected void clusterPostcomputation()
        Method invoked after recomputeClusters(int) is being called.
      • getCategory

        protected int getCategory​(int iter,
                                  String uniqueId,
                                  Hash[] hashes,
                                  Set<Integer> categoriesAltered)
      • updateCategories

        protected void updateCategories​(Map<String,​dev.brachtendorf.datastructures.Pair<Integer,​Hash[]>> newImageCategoryMap)
      • cleanupEmptyCategories

        protected void cleanupEmptyCategories()
      • resetBitWeights

        protected void resetBitWeights()
        Reset the clusteroids hash weights while keeping the mode hash intact
      • addHashingAlgorithm

        public boolean addHashingAlgorithm​(HashingAlgorithm algo)
        Description copied from class: PlainImageMatcher
        Append a new hashing algorithm to be used by this matcher. The same algorithm may only be added once. Attempts to add the same algorithm twice is a NOP.

        For some matchers the order of added hashing algorithms is crucial. The order the hashes are added is preserved.

        Overrides:
        addHashingAlgorithm in class PlainImageMatcher
        Parameters:
        algo - The algorithms to be added
        Returns:
        true if the algorithm was added, false if it was already present
      • addCategoricalImages

        public void addCategoricalImages​(Collection<LabeledImage> images)
        The name of the labeled image serves as unique identifier
        Parameters:
        images - the images to categories
      • addCategoricalImages

        public void addCategoricalImages​(LabeledImage... images)
      • addCategoricalImage

        public double addCategoricalImage​(LabeledImage labeledImage)
      • addCategoricalImage

        public double addCategoricalImage​(BufferedImage bi,
                                          int category,
                                          String uniqueId)
      • addCategoricalImage

        protected double addCategoricalImage​(Hash[] hashes,
                                             int category,
                                             String uniqueId)
      • computeDistanceToCluster

        protected double computeDistanceToCluster​(FuzzyHash cluster,
                                                  Hash imageHash)
      • categorizeImage

        public CategorizationResult categorizeImage​(BufferedImage bi)
        Compute the category of the supplied image. A category is a collection of similar images mapped to a common hash which minimizes the distance of all hashes mapped to this category.
        Specified by:
        categorizeImage in interface CategoricalImageMatcher
        Overrides:
        categorizeImage in class AbstractCategoricalMatcher
        Parameters:
        bi - The buffered image to categorize
        Returns:
        a pair whose first value returns the category and second value returns a distance measure between the category and the supplied image. Smaller distances meaning a closer match
      • categorizeImage

        protected CategorizationResult categorizeImage​(String uniqueId,
                                                       BufferedImage bi)
        Compute the category of the supplied image. A category is a collection of similar images mapped to a common hash which minimizes the distance of all hashes mapped to this category.
        Specified by:
        categorizeImage in class AbstractCategoricalMatcher
        Parameters:
        uniqueId - The id used to reference the image in the future.
        bi - The buffered image to categorize
        Returns:
        a pair whose first value returns the category and second value returns a distance measure between the category and the supplied image
      • categorizeImage

        protected CategorizationResult categorizeImage​(String uniqueId,
                                                       Hash[] hashes,
                                                       Set<Integer> categoriesAltered)
        Categorize an image on a subset of all categories with hashes present. This method is used during recomputation of the clusters and cuts down the number of comparisons that have to be made. If an image is newly inserted the categoriesAltered variable should contain all available categories.
        Parameters:
        uniqueId - The id used to reference the image in the future.
        hashes - a hash of the image for each hashing algorithm
        categoriesAltered - a list of the categories which got altered since last computation
        Returns:
        the best category and distance to it. if no matching category can be found return -1 and Double.MaxValue
      • computeDistanceForCategory

        protected double computeDistanceForCategory​(Hash[] hashes,
                                                    int category,
                                                    double bestDistance)
        Compute the distance between an image and a category cluster midpoint. This method is used to compute the minimum distance and therefore might cut the computation short if the distance is higher than the supplied best distance cutoff.
        Parameters:
        hashes - an array containing the hash for an image for each hashing algorithm added to this matcher
        category - the category to compute the distance for
        bestDistance - the best distance found so far. May be used to
        Returns:
        the distance between the image and the category midpoint or Double.MAX_VALUE if bestDistance was reached and copmutation was not finished
      • printClusterInfo

        public void printClusterInfo​(int minImagesInCluster)
      • getAverageDistanceWithinCluster

        public double getAverageDistanceWithinCluster​(int category)
      • getClusterAverageHash

        public FuzzyHash getClusterAverageHash​(HashingAlgorithm algorithm,
                                               int category)
        Get the average hash representing the midpoint of the category cluster.
        Parameters:
        algorithm - the algorithm to get the hash for
        category - the category
        Returns:
        the average hash .