Class FuzzyHash
- java.lang.Object
-
- dev.brachtendorf.jimagehash.hash.Hash
-
- dev.brachtendorf.jimagehash.hash.FuzzyHash
-
- All Implemented Interfaces:
Serializable
public class FuzzyHash extends Hash
A fuzzy hash is an aggregation of multiple hashes mapped to a single mean hash representing the average hash while keeping track of fractional (probability) bits. This kind of composite hash is suited to represent clustered hashes and minimizing the distance to all members added.To receive reasonable results a fuzzy hash should only contain hashes created by the same algorithm.
In the above case the first and last bit have a certainty of 100%. While the middle bits are only present in 66% of the added hashes. TheCombining three hashes H1: 1001 H2: 1011 H3: 1111 ------- Res: 1011 (used for hamming distances and getHashValue())weighted distancewill takes those certainties into account while the original distance method inherited from will calculate the distance to the modus bit for each bit.Implnote: Opposed to the original
Hashequals and hashcode are not overwritten to ensure correct functionality in hash collections after factoring in mutable fields. To check if hashes are equals calculate the distance between the hashes instead.- Since:
- 3.0.0
- Author:
- Kilian
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected int[]bitsThe difference in 1's or 0 bits added for each position.booleandirtyBitsRequires an update of the hashValue-
Fields inherited from class dev.brachtendorf.jimagehash.hash.Hash
algorithmId, hashLength, hashValue
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanequals(Object obj)static FuzzyHashfromFile(File source)Reads a hash from a serialization file and returns it.intgetAddedCount()Return the number of hashes that currently make up this hash.booleangetBitUnsafe(int position)Check if the bit at the given position of the hash is set.doublegetCertainty(int position)Get the certainty of a bit at the given bit position.BigIntegergetHashValue()doublegetMaximalError()doublegetMaxUncertainty(int bitIndex)Return the maximum fuzzy distance for this bit.HashgetUncertaintyHash(double certainty)Return a simple hash containing only the bits that are below the specified uncertainty.boolean[]getUncertaintyMask(double certainty)Return a mask indicating if the bits are above a certain uncertainty.doublegetWeightedDistance(int bitIndex, boolean bit)Gets the weighted distance of this bit in the range [0-1]inthammingDistance(Hash h)Calculate the hamming distance of 2 hash values.inthammingDistanceFast(Hash h)Calculate the hamming distance of 2 hash values.inthammingDistanceFast(BigInteger bInt)Calculate the hamming distance of 2 hash values.inthashCode()voidmerge(Hash hash)Merge a hash into this object and lazily update the values as required.voidmerge(Hash... hash)Merge multiple hashes into this object and lazily update the values as required.voidmergeFast(FuzzyHash hash)Merge a fuzzy hash into this object and lazily update the values as required.voidmergeFast(Hash hash)Merge a hash into this object and lazily update the values as required.voidmergeFast(Hash... hash)Merge multiple hashes into this object and lazily update the values as required.doublenormalizedHammingDistance(Hash h)Calculate the hamming distance of 2 hash values.doublenormalizedHammingDistanceFast(Hash h)Calculate the hamming distance of 2 hash values.voidreset()Removing all previous knowledge of hashes added while keeping the current cluster hash in place treating it as the only hash added.doublesquaredWeightedDistance(FuzzyHash h)Calculate the squared normalized weighted distance between two fuzzy hashes Opposed to the hamming distance the weighted distance takes partial bits into account.doublesquaredWeightedDistance(Hash h)Calculate the squared normalized weighted distance between the supplied hash and this hash.voidsubtract(Hash hash)Subtract a hash from this object and lazily update the values as required.voidsubtractFast(Hash hash)Subtract a hash from this object and lazily update the values as required.byte[]toByteArray()Return the byte representation of the big integer with the leading zero byte stripped if present.voidtoFile(File saveLocation)Saves this hash to a file for persistent storage.BufferedImagetoImage(int blockSize)Creates a visual representation of the hash mapping the hash values to the section of the rescaled image used to generate the hash.BufferedImagetoImage(int blockSize, HashingAlgorithm hashingAlgorithm)Creates a visual representation of the hash mapping the hash values to the section of the rescaled image used to generate the hash.StringtoString()HashtoUncertaintyHash(Hash source, double certainty)Escape the given hash the same waygetUncertaintyHash(double)escapes this fuzzy hash.doubleweightedDistance(FuzzyHash h)Calculate the normalized weighted distance between two fuzzy hashes Opposed to the hamming distance the weighted distance takes partial bits into account.doubleweightedDistance(Hash h)Calculate the normalized weighted distance between the supplied hash and this hash.-
Methods inherited from class dev.brachtendorf.jimagehash.hash.Hash
getAlgorithmId, getBit, getBitResolution, toImage
-
-
-
-
Constructor Detail
-
FuzzyHash
public FuzzyHash()
Create an empty fuzzy hash object with a 0 bit hash length and a undefined algorithm id. These values will be populated as soon as the first hash is added.
-
FuzzyHash
public FuzzyHash(Hash... hashs)
Create a fuzzy hahs by merging the supplied hashes together. The hashes are all expected to be created by the same algorithm.The fuzzy hash will adapt the algorithm id and bit length of the first supplied hash.
- Parameters:
hashs- the hashes to merge.
-
-
Method Detail
-
merge
public void merge(Hash hash)
Merge a hash into this object and lazily update the values as required. A merge operation looks at every individual bit and increments or decrements the counter prompting a recomputation of the underlying hash value if necessary.Merging multiple hashes will result in a hash which has to lowest summed distance to it's members.
Creating composite hashes is only reasonable for hashes generated by the same algorithm. This method will throw an IllegalArgumentException if the hashes are not created by the same algorithm or are of different length
- Parameters:
hash- the hash to merge
-
merge
public void merge(Hash... hash)
Merge multiple hashes into this object and lazily update the values as required. A merge operation looks at every individual bit and increments or decrements the counter prompting a recomputation of the underlying hash value if necessary.Merging multiple hashes will result in a hash which has to lowest summed distance to it's members.
Creating composite hashes is only reasonable for hashes generated by the same algorithm. This method will throw an IllegalArgumentException if the hashes are not created by the same algorithm or are of different length
- Parameters:
hash- the hash to merge
-
mergeFast
public void mergeFast(Hash hash)
Merge a hash into this object and lazily update the values as required. A merge operation looks at every individual bit and increments or decrements the counter prompting a recomputation of the underlying hash value if necessary.Merging multiple hashes is a geometric averaging operation resulting in a hash with the lowest summed distance to it's members.
Opposed to
merge(Hash)this method does not perform any input validation. If incompatible hashes are added (especially hashes with differing lengths) the instance will be left in an undefined state and future calls to any of the methods may return unpredictable results.- Parameters:
hash- to merge
-
mergeFast
public void mergeFast(FuzzyHash hash)
Merge a fuzzy hash into this object and lazily update the values as required. A merge operation looks at every individual bit and increments or decrements the counter prompting a recomputation of the underlying hash value if necessary.Opposed to
merge(Hash)this method does not perform any input validation. If incompatible hashes are added (especially hashes with differing lengths) the instance will be left in an undefined state and future calls to any of the methods may return unpredictable results.- Parameters:
hash- to merge
-
mergeFast
public void mergeFast(Hash... hash)
Merge multiple hashes into this object and lazily update the values as required. A merge operation looks at every individual bit and increments or decrements the counter prompting a recomputation of the underlying hash value if necessary.Merging multiple hashes is a geometric averaging operation resulting in a hash with the lowest summed distance to it's members.
Opposed to
merge(Hash)this method does not perform any input validation. If incompatible hashes are added (especially hashes with differing lengths) the instance will be left in an undefined state and future calls to any of the methods may return unpredictable results.- Parameters:
hash- to merge
-
subtract
public void subtract(Hash hash)
Subtract a hash from this object and lazily update the values as required. A subtraction operation looks at every individual bit and increments or decrements the counter prompting a recomputation of the underlying hash value if necessary.Creating composite hashes is only reasonable for hashes generated by the same algorithm. This method will throw an IllegalArgumentException if the hashes are not created by the same algorithm or are of different length. Only hashes that were added beforehand should be subtracted.
- Parameters:
hash- to subtract
-
subtractFast
public void subtractFast(Hash hash)
Subtract a hash from this object and lazily update the values as required. A subtraction operation looks at every individual bit and increments or decrements the counter prompting a recomputation of the underlying hash value if necessary.Creating composite hashes is only reasonable for hashes generated by the same algorithm. This method will throw an IllegalArgumentException if the hashes are not created by the same algorithm or are of different length. Only hashes that were added beforehand should be subtracted.
- Parameters:
hash- to subtract
-
weightedDistance
public double weightedDistance(Hash h)
Calculate the normalized weighted distance between the supplied hash and this hash. Opposed to the hamming distance the weighted distance takes partial bits into account. e.g. if this fuzzy hashes first bit hash a probability of 70% being a 0 it will have a weighted distance of .7 if it's a 1. Be aware that this method id much more expensive than calculating the simple distance between 2 ordinary hashes. (1 quick xor vs multiple calculations per bit).- Parameters:
h- The hash to calculate the distance to- Returns:
- similarity value ranging between [0 - 1]
-
weightedDistance
public double weightedDistance(FuzzyHash h)
Calculate the normalized weighted distance between two fuzzy hashes Opposed to the hamming distance the weighted distance takes partial bits into account. e.g. if this fuzzy hashes first bit hash a probability of 70% being a 0 and the second fuzzyhash has a 60% probability the distance will be 10%. Be aware that this method id much more expensive than calculating the simple distance between 2 ordinary hashes. (1 quick xor vs multiple calculations per bit).- Parameters:
h- The hash to calculate the distance to- Returns:
- similarity value ranging between [0 - 1]
-
squaredWeightedDistance
public double squaredWeightedDistance(Hash h)
Calculate the squared normalized weighted distance between the supplied hash and this hash. Opposed to the hamming distance the weighted distance takes partial bits into account. e.g. if this fuzzy hashes first bit hash a probability of 70% being a 0 it will have a weighted distance of .7^2 if it's a 1. Be aware that this method id much more expensive than calculating the simple distance between 2 ordinary hashes. (1 quick xor vs multiple calculations per bit).- Parameters:
h- The hash to calculate the distance to- Returns:
- similarity value ranging between [0 - 1]
-
squaredWeightedDistance
public double squaredWeightedDistance(FuzzyHash h)
Calculate the squared normalized weighted distance between two fuzzy hashes Opposed to the hamming distance the weighted distance takes partial bits into account. e.g. if this fuzzy hashes first bit hash a probability of 70% being a 0 and the second fuzzyhash has a 60% probability the distance will be 10%^2. Be aware that this method is much more expensive than calculating the simple distance between 2 ordinary hashes. (1 quick xor vs multiple calculations per bit).- Parameters:
h- The hash to calculate the distance to- Returns:
- similarity value ranging between [0 - 1]
-
getMaximalError
public double getMaximalError()
-
reset
public void reset()
Removing all previous knowledge of hashes added while keeping the current cluster hash in place treating it as the only hash added.This operation is the in place equivalent to
FuzzyHash fuzzy ... ... ... FuzzyHash temp = new FuzzyHash(); temp.mergeFast(new Hash(fuzzy.getHashValue(),fuzzy.getBitResolution,fuzzy.getAlgorithmId())); fuzzy = temp;
-
getCertainty
public double getCertainty(int position)
Get the certainty of a bit at the given bit position.The returned value is in the range [-1,1]. A negative value indicates that the bit is more likely to be a 0. A positive value indicates the bit to be more likely a 1.
The value is the probability of a randomly drawn hash from the set of previously merged hashes nth bit being 0 or 1.
- Parameters:
position- the bit position.- Returns:
- The certainty in range of [-1,1].
-
getUncertaintyMask
public boolean[] getUncertaintyMask(double certainty)
Return a mask indicating if the bits are above a certain uncertainty.The certainty of a hash is calculated by comparing the number of 0 bits at position n with the number of 1 bits at position n of all the added hashes. If all hashes agree the certainty is 100%. If 50% of the hashes contain 0 bits and the other half contains 1 bits the bit has a certainty of 0.
Be aware that index 0 relegates to the rightmost bit. Printing the array will show the boolean values in reverse order.
This method only returns useable results as soon as one hash was added-
- Parameters:
certainty- the certainty [0-1] up to which bits will be included. In other words, if a bit is more certain than specified by this argument it will not be included in the result.- Returns:
- a boolean array indicating which bits are uncertain
-
getUncertaintyHash
public Hash getUncertaintyHash(double certainty)
Return a simple hash containing only the bits that are below the specified uncertainty. This hash can be used to further compare hashes matched to this cluster while ignoring bits that are likely to be similar.To obtain compatible hashes take a look at {
toUncertaintyHash(Hash, double)to escape other hashes the same way.- Parameters:
certainty- the certainty [0-1] up to which bits will be included. In other words, if a bit is more certain than specified by this argument it will not be included in the result.- Returns:
- a hash with the certain bits discarded.
-
toUncertaintyHash
public Hash toUncertaintyHash(Hash source, double certainty)
Escape the given hash the same waygetUncertaintyHash(double)escapes this fuzzy hash.Return a simple hash containing only the bits that are below the specified uncertainty of the fuzzy hash. This hash can be used to further compare hashes matched to this cluster while ignoring bits that are likely to be similar.
To get reasonable results the source hash has to be compatible with the fuzzy hash, meaning that it was generated by the same algorithm and settings as the hashes which make up the fuzzy hash.
- Parameters:
source- the hash to convert.certainty- threshold to indicate which bits to discard- Returns:
- a hash with the certain bits discarded.
-
getHashValue
public BigInteger getHashValue()
- Overrides:
getHashValuein classHash- Returns:
- the base BigInteger holding the hash value
-
getBitUnsafe
public boolean getBitUnsafe(int position)
Description copied from class:HashCheck if the bit at the given position of the hash is set. This method does not check the bounds of the supplied argument.- Overrides:
getBitUnsafein classHash- Parameters:
position- of the bit. An index of 0 points to the lowest (rightmost bit)- Returns:
- true if the bit is set (1). False if it's not set (0) ot the index is bigger than the hash length.
-
hammingDistance
public int hammingDistance(Hash h)
Description copied from class:HashCalculate the hamming distance of 2 hash values. The distance of two hashes is the difference of the individual bits found in the hash.The hamming distance falls within [0-bitResolution]. Lower values indicate closer similarity while identical images must return a score of 0. On the flip side score of 0 does not mean images have to be identical!
A longer hash (higher bitResolution) will increase the average hamming distance returned. While this method allows for the most accurate fine tuning of the distance
Hash.normalizedHammingDistance(Hash)is hash length independent.Please be aware that only hashes produced by the same algorithm with the same settings will return meaningful result and should be compared. This method will check if the hashes are compatible if no additional check is required see
Hash.hammingDistanceFast(Hash)- Overrides:
hammingDistancein classHash- Parameters:
h- The hash to calculate the distance to- Returns:
- similarity value ranging between [0 - hash length]
-
hammingDistanceFast
public int hammingDistanceFast(Hash h)
Description copied from class:HashCalculate the hamming distance of 2 hash values. The distance of two hashes is the difference of the individual bits found in the hash.The hamming distance falls within [0-bitResolution]. Lower values indicate closer similarity while identical images must return a score of 0. On the flip side score of 0 does not mean images have to be identical!
A longer hash (higher bitResolution) will increase the average hamming distance returned. While this method allows for the most accurate fine tuning of the distance
Hash.normalizedHammingDistance(Hash)is hash length independent.Please be aware that only hashes produced by the same algorithm with the same settings will return meaningful result and should be compared. This method will NOT check if the hashes are compatible.
- Overrides:
hammingDistanceFastin classHash- Parameters:
h- The hash to calculate the distance to- Returns:
- similarity value ranging between [0 - hash length]
- See Also:
Hash.hammingDistance(Hash)
-
hammingDistanceFast
public int hammingDistanceFast(BigInteger bInt)
Description copied from class:HashCalculate the hamming distance of 2 hash values. The distance of two hashes is the difference of the individual bits found in the hash.The hamming distance falls within [0-bitResolution]. Lower values indicate closer similarity while identical images must return a score of 0. On the flip side score of 0 does not mean images have to be identical!
A longer hash (higher bitResolution) will increase the average hamming distance returned. While this method allows for the most accurate fine tuning of the distance
Hash.normalizedHammingDistance(Hash)is hash length independent.Please be aware that only hashes produced by the same algorithm with the same settings will return meaningful result and should be compared. This method will NOT check if the hashes are compatible.
- Overrides:
hammingDistanceFastin classHash- Parameters:
bInt- A big integer representing a hash- Returns:
- similarity value ranging between [0 - hash length]
- See Also:
Hash.hammingDistance(Hash)
-
normalizedHammingDistance
public double normalizedHammingDistance(Hash h)
Description copied from class:HashCalculate the hamming distance of 2 hash values. The distance of two hashes is the difference of the individual bits found in the hash.The normalized hamming distance falls within [0-1]. Lower values indicate closer similarity while identical images must return a score of 0. On the flip side score of 0 does not mean images have to be identical!
See
Hash.hammingDistance(Hash)for a non normalized version Please be aware that only hashes produced by the same algorithm with the same settings will return meaningful result and should be compared. This method will check if the hashes are compatible if no additional check is required seeHash.normalizedHammingDistanceFast(Hash)- Overrides:
normalizedHammingDistancein classHash- Parameters:
h- The hash to calculate the distance to- Returns:
- similarity value ranging between [0 - 1]
-
normalizedHammingDistanceFast
public double normalizedHammingDistanceFast(Hash h)
Description copied from class:HashCalculate the hamming distance of 2 hash values. The distance of two hashes is the difference of the individual bits found in the hash.The normalized hamming distance falls within [0-1]. Lower values indicate closer similarity while identical images must return a score of 0. On the flip side score of 0 does not mean images have to be identical!
See
Hash.hammingDistance(Hash)for a non normalized version Please be aware that only hashes produced by the same algorithm with the same settings will return meaningful result and should be compared. This method will NOT check if the hashes are compatible.- Overrides:
normalizedHammingDistanceFastin classHash- Parameters:
h- The hash to calculate the distance to- Returns:
- similarity value ranging between [0 - 1]
- See Also:
Hash.hammingDistance(Hash)
-
toByteArray
public byte[] toByteArray()
Description copied from class:HashReturn the byte representation of the big integer with the leading zero byte stripped if present. The BigInteger class prepends a sign byte if necessary to indicate the signum of the number. Since our hashes are always positive we can get rid of it and reduce the space requirement in our db by 1 byte.To reconstruct the big integer value we can simply prepend a [0x00] byte even if it wasn't present in the first place. The constructor
BigInteger(byte[])will take care of it.- Overrides:
toByteArrayin classHash- Returns:
- the byte representation of the big integer without an artificial sign byte.
-
toImage
public BufferedImage toImage(int blockSize)
Creates a visual representation of the hash mapping the hash values to the section of the rescaled image used to generate the hash.
-
toImage
public BufferedImage toImage(int blockSize, HashingAlgorithm hashingAlgorithm)
Description copied from class:HashCreates a visual representation of the hash mapping the hash values to the section of the rescaled image used to generate the hash.Some hash algorithms may chose to construct their hashes in a non default manner (e.g.
DifferenceHash).- Overrides:
toImagein classHash- Parameters:
blockSize- scaling factor of each pixel in the has. each bit of the hash will be represented to blockSize*blockSize pixelshashingAlgorithm- HashAlgorithm which created this hash.- Returns:
- A black and white image representing the individual bits of the hash
-
getAddedCount
public int getAddedCount()
Return the number of hashes that currently make up this hash. After calling reset the added count will be set to 1.- Returns:
- the count
-
getMaxUncertainty
public double getMaxUncertainty(int bitIndex)
Return the maximum fuzzy distance for this bit. The maximum distance is the distance to either the 0 bit or the 1 bit which ever is greater.- Parameters:
bitIndex- the index of the bit- Returns:
- the maximum distance
-
getWeightedDistance
public double getWeightedDistance(int bitIndex, boolean bit)Gets the weighted distance of this bit in the range [0-1]- Parameters:
bitIndex- the position in the hashbit- if true the distance to a 1 bit will be calculated. if false the distance to a 0 bit.- Returns:
- the distance of full probable bit to this hashes bit.
-
toFile
public void toFile(File saveLocation) throws IOException
Description copied from class:HashSaves this hash to a file for persistent storage. The hash can later be recovered by callingHash.fromFile(File);- Overrides:
toFilein classHash- Parameters:
saveLocation- the file to save the hash to- Throws:
IOException- If an error occurs during file access
-
fromFile
public static FuzzyHash fromFile(File source) throws IOException, ClassNotFoundException
Reads a hash from a serialization file and returns it. Only hashes can be read from file that got saved by the same class instance usingtoFile(File);- Parameters:
source- The file this hash can be read from.- Returns:
- a hash object
- Throws:
IOException- If an error occurs during file readClassNotFoundException- if the class used to serialize this hash can not be found- Since:
- 3.0.0
-
-