Class RandomForestCategorizer

  • All Implemented Interfaces:
    CategoricalImageMatcher

    @Experimental("The image matcher categorizes images based on their distance to the closest match. While this is an okay approach clustering the images based on category yields much cleaner results.")
    public class RandomForestCategorizer
    extends PlainImageMatcher
    implements CategoricalImageMatcher
    Deprecated.
    not ready yet. got rewritten
    Author:
    Kilian
    • Field Detail

      • forest

        protected List<dev.brachtendorf.jimagehash.matcher.categorize.supervised.randomForest.TreeNode> forest
        Deprecated.
        Root nodes of all decision trees making up the random forest
      • labeledImages

        protected List<LabeledImage> labeledImages
        Deprecated.
        Test images used to create test sets to train the forest
    • Constructor Detail

      • RandomForestCategorizer

        public RandomForestCategorizer()
        Deprecated.
    • Method Detail

      • addTestImages

        public void addTestImages​(Collection<LabeledImage> data)
        Deprecated.
        Add test images to this image matcher which will be used to construct the random forest.

        Be aware that labeled images are kept in memory as long as clearTestImages() has not been called.

        Parameters:
        data - the images to add
      • addTestImages

        public void addTestImages​(LabeledImage... data)
        Deprecated.
        Add test images to this image matcher which will be used to construct the random forest.

        Be aware that labeled images are kept in memory as long as clearTestImages() has not been called.

        Parameters:
        data - the images to add
      • addTestImages

        public void addTestImages​(LabeledImage lData)
        Deprecated.
        Add a labeled image to this image matcher which will be used to construct the random forest.

        Be aware that labeled images are kept in memory as long as clearTestImages() has not been called.

        Parameters:
        lData - the image to add
      • clearTestImages

        public void clearTestImages()
        Deprecated.
        Clears the test images. Any references made by this object are released and allows the gc to free the underlaying buffered image if it's not referenced anywhere else.

        Be aware that you need to add new test images before calling trainMatcher(int, int, int).

      • trainMatcher

        public void trainMatcher​(int trees,
                                 int numVarsSearchRange,
                                 int numVarsRep)
        Deprecated.
        Populate the decision trees used in this image matcher. The forest has to be initialized when ever new labeled test images are added.
        Parameters:
        trees - The number of trees created. Has to be odd. The more trees present the better the accuracy is
        numVarsSearchRange - The number of variables used in each tree. Which variables are chosen is randomly decided. Not using every variable prevents overfitting. //TODO explain better
        numVarsRep - The variables used are numerical values which can appear multiple times per branch. Limit the number of consecutive times a single var can appear in the same brench.
      • createForest

        protected Object[] createForest​(int trees,
                                        int numVars,
                                        int numVarsRep,
                                        List<dev.brachtendorf.datastructures.Pair<FuzzyHash,​HashingAlgorithm>> randomVariables,
                                        Map<HashingAlgorithm,​Map<BufferedImage,​Hash>> preComputedHashesTestAgainst,
                                        ExecutorService tPool)
        Deprecated.
        Parameters:
        trees - number of trees to create
        numVars - number of variables to try at each brench. preven overfitting
        numVarsRep - number of duplicates for the same variable allowed in a single tree
        randomVariables - variables to check against
        preComputedHashesTestAgainst - hashes of test data
        tPool - thread pool executor
        Returns:
        Object array containing a reference to the root, the out of bag classification error and the total error including training data
      • countLeafCategories

        public Map<Integer,​Integer> countLeafCategories()
        Deprecated.
      • categorizeImage

        public CategorizationResult categorizeImage​(BufferedImage bi)
        Deprecated.
        Compute the category of the supplied image. A category is a collection of similar images mapped to a common hash which minimizes the distance of all hashes mapped to this category.

        The distance returned by this method calls indicates how many percent of the trees in the random forest agree with the decision range (0 - 1].

        Specified by:
        categorizeImage in interface CategoricalImageMatcher
        Parameters:
        bi - The buffered image to categorize
        Returns:
        a pair whose first value returns the category and second value returns a distance measure between the category and the supplied image. Smaller distances meaning a closer match
      • recomputeCategories

        public void recomputeCategories()
        Deprecated.
        Description copied from interface: CategoricalImageMatcher
        Recompute the category definition of this clustering matcher and it's nested matchers.

        Recomputing categories will take recently added images into account and update image/category affiliation if necessary. This operation needs to be called manually due to the potential high cost of this method call.

        Unless otherwise noted the matcher makes no guarantee that the image category does not change with this method execution.

        Specified by:
        recomputeCategories in interface CategoricalImageMatcher
      • getImagesInCategory

        public List<String> getImagesInCategory​(int category)
        Deprecated.
        Description copied from interface: CategoricalImageMatcher
        Get the unique id's of all images mapped to this category
        Specified by:
        getImagesInCategory in interface CategoricalImageMatcher
        Parameters:
        category - to check for
        Returns:
        a list of all unique id's mapped to this category
      • getCategory

        public int getCategory​(String uniqueId)
        Deprecated.
        Description copied from interface: CategoricalImageMatcher
        Get the current category of the image described by this unique id. A category usually maps an image to a cluster.
        Specified by:
        getCategory in interface CategoricalImageMatcher
        Parameters:
        uniqueId - the id of a previously added image
        Returns:
        the category
      • printTree

        public void printTree()
        Deprecated.
      • categorizeImageAndAdd

        public CategorizationResult categorizeImageAndAdd​(BufferedImage bi,
                                                          String uniqueId)
        Deprecated.
        Description copied from interface: CategoricalImageMatcher
        Compute the closest category of an image and afterwards add it to the internal categorization queue. Some matchers may choose to immediately update the current category to reflect the changes.

        The add action is implementation depended. Some categorizers may choose to directly incorporate the image and update it's category representation other algorithms may require a call to CategoricalImageMatcher.recomputeCategories() before the addition takes effect.

        Specified by:
        categorizeImageAndAdd in interface CategoricalImageMatcher
        Parameters:
        bi - the image to categorize
        uniqueId - the unique id to reference the image by.
        Returns:
        the currently closest cluster for this image.