Class Hash

  • All Implemented Interfaces:
    Serializable
    Direct Known Subclasses:
    DifferenceHash.DHash, FuzzyHash

    public class Hash
    extends Object
    implements Serializable
    Hashes are bit encoded encoded values (0101011101) created from images using a hashing algorithm. Hashes enable a quick approximate similarity comparison between images while only storing a fraction of the original data.

    They are created from images down scaling information and enabling quick comparison between instances produced by the same algorithm. Every bit in the hash usually represents a section of the image containing certain information (hue, brightness, color, frequencies or gradients)

    Since:
    1.0.0, 3.0.0 Serializable
    Author:
    Kilian
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected int algorithmId
      Unique identifier of the algorithm and settings used to create the hash
      protected int hashLength
      How many bits does this hash represent.
      protected BigInteger hashValue
      Hash value representation Hashes are constructed by left shifting BigIntegers with either Zero or One depending on the condition found in the image.
    • Constructor Summary

      Constructors 
      Constructor Description
      Hash​(BigInteger hashValue, int hashLength, int algorithmId)
      Creates a Hash object with the specified hashValue and algorithmId.
    • Field Detail

      • algorithmId

        protected int algorithmId
        Unique identifier of the algorithm and settings used to create the hash
      • hashValue

        protected BigInteger hashValue
        Hash value representation Hashes are constructed by left shifting BigIntegers with either Zero or One depending on the condition found in the image. Preceding 0's will be truncated therefore it is the algorithms responsibility to add a 1 padding bit at the beginning new BigInteger("011011) new BigInteger("000101) 1xxxxx
      • hashLength

        protected int hashLength
        How many bits does this hash represent. Necessary due to suffix 0 bits beginning dropped.
    • Constructor Detail

      • Hash

        public Hash​(BigInteger hashValue,
                    int hashLength,
                    int algorithmId)
        Creates a Hash object with the specified hashValue and algorithmId. To allow save comparison of different hashes they have to be generated by the same algorithm.
        Parameters:
        hashValue - The hash value describing the image
        hashLength - the actual bit resolution of the hash. The bigInteger truncates leading zero bits resulting in a loss of length information.
        algorithmId - Unique identifier of the algorithm used to create this hash
    • Method Detail

      • hammingDistance

        public int hammingDistance​(Hash h)
        Calculate the hamming distance of 2 hash values. The distance of two hashes is the difference of the individual bits found in the hash.

        The hamming distance falls within [0-bitResolution]. Lower values indicate closer similarity while identical images must return a score of 0. On the flip side score of 0 does not mean images have to be identical!

        A longer hash (higher bitResolution) will increase the average hamming distance returned. While this method allows for the most accurate fine tuning of the distance normalizedHammingDistance(Hash) is hash length independent.

        Please be aware that only hashes produced by the same algorithm with the same settings will return meaningful result and should be compared. This method will check if the hashes are compatible if no additional check is required see hammingDistanceFast(Hash)

        Parameters:
        h - The hash to calculate the distance to
        Returns:
        similarity value ranging between [0 - hash length]
      • hammingDistanceFast

        public int hammingDistanceFast​(Hash h)
        Calculate the hamming distance of 2 hash values. The distance of two hashes is the difference of the individual bits found in the hash.

        The hamming distance falls within [0-bitResolution]. Lower values indicate closer similarity while identical images must return a score of 0. On the flip side score of 0 does not mean images have to be identical!

        A longer hash (higher bitResolution) will increase the average hamming distance returned. While this method allows for the most accurate fine tuning of the distance normalizedHammingDistance(Hash) is hash length independent.

        Please be aware that only hashes produced by the same algorithm with the same settings will return meaningful result and should be compared. This method will NOT check if the hashes are compatible.

        Parameters:
        h - The hash to calculate the distance to
        Returns:
        similarity value ranging between [0 - hash length]
        See Also:
        hammingDistance(Hash)
      • hammingDistanceFast

        public int hammingDistanceFast​(BigInteger bInt)
        Calculate the hamming distance of 2 hash values. The distance of two hashes is the difference of the individual bits found in the hash.

        The hamming distance falls within [0-bitResolution]. Lower values indicate closer similarity while identical images must return a score of 0. On the flip side score of 0 does not mean images have to be identical!

        A longer hash (higher bitResolution) will increase the average hamming distance returned. While this method allows for the most accurate fine tuning of the distance normalizedHammingDistance(Hash) is hash length independent.

        Please be aware that only hashes produced by the same algorithm with the same settings will return meaningful result and should be compared. This method will NOT check if the hashes are compatible.

        Parameters:
        bInt - A big integer representing a hash
        Returns:
        similarity value ranging between [0 - hash length]
        See Also:
        hammingDistance(Hash)
      • normalizedHammingDistance

        public double normalizedHammingDistance​(Hash h)
        Calculate the hamming distance of 2 hash values. The distance of two hashes is the difference of the individual bits found in the hash.

        The normalized hamming distance falls within [0-1]. Lower values indicate closer similarity while identical images must return a score of 0. On the flip side score of 0 does not mean images have to be identical!

        See hammingDistance(Hash) for a non normalized version Please be aware that only hashes produced by the same algorithm with the same settings will return meaningful result and should be compared. This method will check if the hashes are compatible if no additional check is required see normalizedHammingDistanceFast(Hash)

        Parameters:
        h - The hash to calculate the distance to
        Returns:
        similarity value ranging between [0 - 1]
      • normalizedHammingDistanceFast

        public double normalizedHammingDistanceFast​(Hash h)
        Calculate the hamming distance of 2 hash values. The distance of two hashes is the difference of the individual bits found in the hash.

        The normalized hamming distance falls within [0-1]. Lower values indicate closer similarity while identical images must return a score of 0. On the flip side score of 0 does not mean images have to be identical!

        See hammingDistance(Hash) for a non normalized version Please be aware that only hashes produced by the same algorithm with the same settings will return meaningful result and should be compared. This method will NOT check if the hashes are compatible.

        Parameters:
        h - The hash to calculate the distance to
        Returns:
        similarity value ranging between [0 - 1]
        See Also:
        hammingDistance(Hash)
      • getBit

        public boolean getBit​(int position)
        Check if the bit at the given position is set.
        Parameters:
        position - of the bit. An index of 0 points to the lowest (rightmost bit)
        Returns:
        true if the bit is set (1) or false if it's not set (0)
        Throws:
        IllegalArgumentException - if the supplied index is outside the hash bound
        Since:
        2.0.0
      • getBitUnsafe

        public boolean getBitUnsafe​(int position)
        Check if the bit at the given position of the hash is set. This method does not check the bounds of the supplied argument.
        Parameters:
        position - of the bit. An index of 0 points to the lowest (rightmost bit)
        Returns:
        true if the bit is set (1). False if it's not set (0) ot the index is bigger than the hash length.
        Throws:
        ArithmeticException - if position is negative
        Since:
        2.0.0
      • getAlgorithmId

        public int getAlgorithmId()
        Return the algorithm identifier specifying by which algorithm and setting this hash was created. The id shall remain constant.
        Returns:
        The algorithm id
      • getHashValue

        public BigInteger getHashValue()
        Returns:
        the base BigInteger holding the hash value
      • toImage

        public BufferedImage toImage​(int blockSize)
        Creates a visual representation of the hash mapping the hash values to the section of the rescaled image used to generate the hash assuming default bit encoding.

        Some hash algorithms may chose to construct their hashes in a non default manner (e.g. DifferenceHash). In this case toImage(int, HashingAlgorithm) may help to resolve the issue;

        Parameters:
        blockSize - scaling factor of each pixel in the has. each bit of the hash will be represented to blockSize*blockSize pixels
        Returns:
        A black and white image representing the individual bits of the hash
      • toImage

        public BufferedImage toImage​(int blockSize,
                                     HashingAlgorithm hasher)
        Creates a visual representation of the hash mapping the hash values to the section of the rescaled image used to generate the hash.

        Some hash algorithms may chose to construct their hashes in a non default manner (e.g. DifferenceHash).

        Parameters:
        blockSize - scaling factor of each pixel in the has. each bit of the hash will be represented to blockSize*blockSize pixels
        hasher - HashAlgorithm which created this hash.
        Returns:
        A black and white image representing the individual bits of the hash
        Since:
        3.0.0
      • toImage

        public BufferedImage toImage​(int[] bitColorIndex,
                                     javafx.scene.paint.Color[] colors,
                                     int blockSize)
        Creates a visual representation of the hash mapping the hash values to the section of the rescaled image used to generate the hash.
        Parameters:
        bitColorIndex - array mapping each bit of the hash to a color of the color array
        colors - array to colorize the pixels
        blockSize - scaling factor of each pixel in the has. each bit of the hash will be represented to blockSize*blockSize pixels
        Returns:
        A colorized image representing the individual bits of the hash
      • getBitResolution

        public int getBitResolution()
        Returns:
        the hash resolution in bits
      • toFile

        public void toFile​(File saveLocation)
                    throws IOException
        Saves this hash to a file for persistent storage. The hash can later be recovered by calling fromFile(File);
        Parameters:
        saveLocation - the file to save the hash to
        Throws:
        IOException - If an error occurs during file access
        Since:
        3.0.0
      • fromFile

        public static Hash fromFile​(File source)
                             throws IOException,
                                    ClassNotFoundException
        Reads a hash from a serialization file and returns it. Only hashes can be read from file that got saved by the same class instance using toFile(File);
        Parameters:
        source - The file this hash can be read from.
        Returns:
        a hash object
        Throws:
        IOException - If an error occurs during file read
        ClassNotFoundException - if the class used to serialize this hash can not be found
        Since:
        3.0.0
      • toByteArray

        public byte[] toByteArray()
        Return the byte representation of the big integer with the leading zero byte stripped if present. The BigInteger class prepends a sign byte if necessary to indicate the signum of the number. Since our hashes are always positive we can get rid of it and reduce the space requirement in our db by 1 byte.

        To reconstruct the big integer value we can simply prepend a [0x00] byte even if it wasn't present in the first place. The constructor BigInteger(byte[]) will take care of it.

        Returns:
        the byte representation of the big integer without an artificial sign byte.
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object