Class DatabaseImageMatcher

  • All Implemented Interfaces:
    Serializable, AutoCloseable
    Direct Known Subclasses:
    H2DatabaseImageMatcher

    public class DatabaseImageMatcher
    extends TypedImageMatcher
    implements Serializable, AutoCloseable
    A naive database based image matcher implementation. Images indexed by this matcher will be added to the database and retrieved if an image match is queried.

    The image matcher supports chaining multiple hashing steps which will be invoked in the order the algorithms were added. Once a hashing algorithm fails to match a specific image the image is discarded pruning the search tree quickly.

    Opposed to the ConsecutiveMatcher this matcher does not stores a reference to the image data itself but just keeps track of the hash and the url of the image file. Additionally if hashing algorithms are added after images have been hashed the images will not be found without reindexing the image in question..

    Multiple database image matchers may use the same database in which case hashes created by the same hashing algorithm will be used in both matchers.

     
     
     DatabaseImageMatcher matcher0, matcher1; 
     
     matcher0.addHashingAlgorithm(new AverageHash(32),...,...)
     matcher1.addHashingAlgorithm(new AverageHash(32),...,...)
     
     matcher0.addHashingAlgorithm(new AverageHash(24),...,...)
    
     matcher0.addImage(Image1)
     
     
    Starting from this point matcher1 would also be able to match against Image1. Be aware that this relationship isn't symmetric. Images added by calling matcher1.addImage(..) method will be matched at the first step in matcher0 but fail to find a hash for AverageHash(24) therefore discarding the image as a possible match.

    If this behaviour is not desired simply choose a different database for each image matcher.

    2 + n Tables are generated to save vales:

    1. ImageHasher(id,serialize): Allows to serialize an image matcher to the database
    2. HashingAlgos(id,keyLenght): Saves the bit resolution of each hashing algorithm
    3. ... n a table for each hashing algorithm used in an image matcher

    For each and every match the hashes have to be read from the database. This allows to persistently stores hashes but might not be as efficient as the ConsecutiveMatcher. Optimizations may include to store 0 or 1 level hashes (hashes created by the first invoked hashing algorithms at a memory level and only retrieve the later hashes from the database.

    Since:
    2.0.2 added, 3.0.0 extract h2 database image matcher into it's own class
    Author:
    Kilian
    See Also:
    Serialized Form
    • Field Detail

      • conn

        protected transient Connection conn
        Database connection. Maybe use connection pooling?
    • Constructor Detail

      • DatabaseImageMatcher

        public DatabaseImageMatcher​(Connection connection)
                             throws SQLException
        Attempts to establish a connection to the given database using the supplied connection object. If the database does not yet exist an empty db will be initialized.
        Parameters:
        connection - the database connection
        Throws:
        SQLException - if a database access error occurs null
        SQLTimeoutException - when the driver has determined that the timeout value specified by the setLoginTimeout method has been exceeded and has at least tried to cancel the current database connection attempt
    • Method Detail

      • getFromDatabase

        public static DatabaseImageMatcher getFromDatabase​(Connection conn,
                                                           int id)
                                                    throws SQLException
        Get a database image matcher which previously was serialized using serializeToDatabase(int). If the serialized matcher does not exist the connection will be closed.
        Parameters:
        conn - the database connection
        id - the id supplied to the serializeDatabase call
        Returns:
        the image matcher found in the database or null if not present
        Throws:
        SQLException - if an SQL exception occurs
      • initialize

        protected void initialize​(Connection conn)
                           throws SQLException
        Create the default tables used if they do not yet exist.
        Parameters:
        conn - The database connection
        Throws:
        SQLException - if an sql error occurs
      • addHashingAlgorithm

        public void addHashingAlgorithm​(HashingAlgorithm algo,
                                        double threshold)
        Append a new hashing algorithm which will be executed after all hash algorithms passed the test.

        The same algorithm may only be added once. Attempts to add an identical algorithm will instead update the settings of the old instance.

        This method assumes the normalized hamming distance. If the definite distance shall be used take a look at TypedImageMatcher.addHashingAlgorithm(HashingAlgorithm, double, boolean) throws a wrapped SQL exception as RuntimeException if an SQL error occurs during table creation.

        Overrides:
        addHashingAlgorithm in class TypedImageMatcher
        Parameters:
        algo - The algorithms to be added
        threshold - maximum normalized hamming distance between hashes in order to pass as identical image
      • addHashingAlgorithm

        public void addHashingAlgorithm​(HashingAlgorithm algo,
                                        double threshold,
                                        boolean normalized)
        Append a new hashing algorithm which will be executed after all hash algorithms passed the test.

        The same algorithm may only be added once to an image hasher. Attempts to add an identical algorithm will instead update the settings of the old instance. throws a wrapped SQL exception as RuntimeException if an SQL error occurs during table creation.

        Overrides:
        addHashingAlgorithm in class TypedImageMatcher
        Parameters:
        algo - The algorithms to be added
        threshold - the threshold the hamming distance may be in order to pass as identical image.
        normalized - Weather the normalized or default hamming distance shall be used. The normalized hamming distance will be in range of [0-1] while the hamming distance depends on the length of the hash
      • addImage

        public void addImage​(File imageFile)
                      throws IOException,
                             SQLException
        Index the image. This enables the image matcher to find the image in future searches. The database image matcher does not store the image data itself but indexes the hash bound to the absolute path of the image.

        The path of the file has to be unique in order for this operation to return deterministic results. Otherwise this image will only added to the database for the hashing algorithms no entry exists yet.

        This is useful for the situation in which you want to add an additional hashing algorithm to the database image matcher, but will leave the db in inconsistent stage the unique id is used multiple times.

        Parameters:
        imageFile - The image whose hash will be added to the matcher
        Throws:
        IOException - if an error exists reading the file
        SQLException - if an SQL error occurs
      • addImage

        public void addImage​(String uniqueId,
                             File imageFile)
                      throws IOException,
                             SQLException
        Index the image. This enables the image matcher to find the image in future searches. The database image matcher does not store the image data itself but indexes the hash bound to the absolute path of the image.

        The uniqueId has to be globally unique in order for this operation to return deterministic results. Otherwise this image will only added to the database for the hashing algorithms no entry exists yet.

        This is useful for the situation in which you want to add an additional hashing algorithm to the database image matcher, but will leave the db in inconsistent stage the unique id is used multiple times.

        Parameters:
        uniqueId - a unique identifier returned if querying for the image
        imageFile - The image whose hash will be added to the matcher
        Throws:
        IOException - if an error exists reading the file
        SQLException - if an SQL error occurs
        Since:
        2.0.2
      • addImages

        public void addImages​(File... images)
                       throws IOException,
                              SQLException
        Index the images. This enables the image matcher to find the image in future searches. The database image matcher does not store the image data itself but indexes the hash bound to the absolute path of the image. *

        The path of the files have to be unique in order for this operation to return deterministic results. Otherwise this image will only added to the database for the hashing algorithms no entry exists yet.

        This is useful for the situation in which you want to add an additional hashing algorithm to the database image matcher, but will leave the db in inconsistent stage the unique id is used multiple times.

        Parameters:
        images - The images whose hash will be added to the matcher
        Throws:
        IOException - if an error exists reading the file
        SQLException - if an SQL error occurs
      • addImages

        public void addImages​(String[] uniqueIds,
                              File[] images)
                       throws IOException,
                              SQLException
        Index the images. This enables the image matcher to find the image in future searches. The database image matcher does not store the image data itself but indexes the hash bound to the absolute path of the image.

        The uniqueIds have to be globally unique in order for this operation to return deterministic results. Otherwise this image will only added to the database for the hashing algorithms no entry exists yet.

        This is useful for the situation in which you want to add an additional hashing algorithm to the database image matcher, but will leave the db in inconsistent stage the unique id is used multiple times.

        Parameters:
        uniqueIds - a unique identifier returned if querying for the image
        images - The images whose hash will be added to the matcher
        Throws:
        IOException - if an error exists reading the file
        SQLException - if an SQL error occurs
        IllegalArgumentException - if uniqueIds and images don't have the same length
        Since:
        2.0.2
      • addImage

        public void addImage​(String uniqueId,
                             BufferedImage image)
                      throws SQLException
        Index the image. This enables the image matcher to find the image in future searches. The database image matcher does not store the image data itself but indexes the hash bound to a user supplied string.

        The uniqueId has to be globally unique in order for this operation to return deterministic results. Otherwise this image will only added to the database for the hashing algorithms no entry exists yet.

        This is useful for the situation in which you want to add an additional hashing algorithm to the database image matcher, but will leave the db in inconsistent stage the unique id is used multiple times.

        Parameters:
        uniqueId - a unique identifier returned if querying for the image
        image - The image to hash
        Throws:
        SQLException - if an SQL error occurs
      • addImages

        public void addImages​(String[] uniqueIds,
                              BufferedImage[] images)
                       throws SQLException
        Index the images. This enables the image matcher to find the image in future searches. The database image matcher does not store the image data itself but indexes the hash bound to a user supplied string.

        The uniqueIds have to be globally unique in order for this operation to return deterministic results. Otherwise this image will only added to the database for the hashing algorithms no entry exists yet.

        This is useful for the situation in which you want to add an additional hashing algorithm to the database image matcher, but will leave the db in inconsistent stage the unique id is used multiple times.

        Parameters:
        uniqueIds - a unique identifier returned if querying for the image
        images - The images to hash
        Throws:
        SQLException - if an SQL error occurs
        IllegalArgumentException - if uniqueIds and images don't have the same length
        Since:
        2.0.2
        See Also:
        addImage(String, BufferedImage)
      • serializeToDatabase

        public void serializeToDatabase​(int id)
                                 throws SQLException
        Serialize this image matcher to the database. The image matcher object can be later be retrieved by calling getFromDatabase(Connection, int)
        Parameters:
        id - The id this image matcher object will be associated with
        Throws:
        SQLException - if an SQL error occurs
      • removeHashingAlgo

        public boolean removeHashingAlgo​(HashingAlgorithm algo,
                                         boolean forceTableDeletion)
                                  throws SQLException
        Removes the hashing algorithm from the image matcher.
        Parameters:
        algo - The algorithm to remove
        forceTableDeletion - if true also delete all hashes in the database created by this particular algorithm. false keep the table and hashes stored. If two or more image matcher use the same database caution should be used when using this command.
        Returns:
        true if the algorithm was removed. False if it wasn't present
        Throws:
        SQLException - if connection to the database failed. An SQL exception can only be thrown if forceTableDeletion is set to true. Even if an exception is thrown the algorithm will be removed from this particular image matcher object.
      • clearHashingAlgorithms

        public void clearHashingAlgorithms​(boolean forceTableDeletion)
                                    throws SQLException
        Removes all hashing algorithm from the image matcher.
        Parameters:
        forceTableDeletion - if true also delete all hashes in the database created by this particular algorithm. false keep the table and hashes stored. If two or more image matcher use the same database caution should be used when using this command.
        Throws:
        SQLException - if connection to the database failed. An SQL exception can only be thrown if forceTableDeletion is set to true. Even if an exception is thrown the algorithm will be removed from this particular image matcher object.
      • getAllMatchingImages

        public Map<String,​PriorityQueue<Result<String>>> getAllMatchingImages()
                                                                             throws SQLException
        Return all images stored in the database which are considered matches to other images in the database.

        Be careful that depending on the number of images in the database this operation can be very expensive.

        Returns:
        A Map containing a queue which points to matched images
         Key: UniqueId Of Image U1
         Value: Images considered matches to U1
                 
        The matched images are unique ids/file paths sorted by the hamming distance of the last applied algorithms
        Throws:
        SQLException - if an SQL error occurs
        Since:
        2.0.2
      • getMatchingImagesWithinDistance

        public PriorityQueue<Result<String>> getMatchingImagesWithinDistance​(BufferedImage image,
                                                                             double[] normalizedDistance)
                                                                      throws SQLException
        Search for all similar images passing the algorithm filters supplied to this matcher. If the image itself was added to the matcher it will be returned with a distance of 0

        This method effectively circumvents the algorithm settings and should be used sparsely only when you know what you are doing. Usually you may want to use instead.

        Parameters:
        image - The image to search matches for
        normalizedDistance - the distance used for the algorithms
        Returns:
        Return all unique ids/file paths sorted by the hamming distance of the last applied algorithms
        Throws:
        SQLException - if an SQL error occurs
        Since:
        2.0.2
      • getMatchingImages

        public PriorityQueue<Result<String>> getMatchingImages​(File imageFile)
                                                        throws SQLException,
                                                               IOException
        Search for all similar images passing the algorithm filters supplied to this matcher. If the image itself was added to the matcher it will be returned with a distance of 0
        Parameters:
        imageFile - The image other images will be matched against
        Returns:
        Return all unique ids/file paths sorted by the hamming distance of the last applied algorithms
        Throws:
        SQLException - if an SQL error occurs
        IOException - if an error occurs when reading the file
      • getMatchingImages

        public PriorityQueue<Result<String>> getMatchingImages​(BufferedImage image)
                                                        throws SQLException
        Search for all similar images passing the algorithm filters supplied to this matcher. If the image itself was added to the matcher it will be returned with a distance of 0
        Parameters:
        image - The image other images will be matched against
        Returns:
        Return all unique ids/file paths sorted by the hamming distance of the last applied algorithms
        Throws:
        SQLException - if an SQL error occurs
      • getSimilarImages

        protected List<Result<String>> getSimilarImages​(Hash targetHash,
                                                        int maxDistance,
                                                        HashingAlgorithm hasher)
                                                 throws SQLException
        Return all url descriptors which describe images within the provided hammington distance of the supplied hash
        Parameters:
        targetHash - The hash to check the database against
        maxDistance - The maximum distance the hashes may have
        hasher - the hashing algorithm used to identify the table
        Returns:
        all urls within distance x of the supplied hash
        Throws:
        SQLException - if an SQL error occurs
      • createHashTable

        protected void createHashTable​(HashingAlgorithm hasher)
                                throws SQLException
        Create a table to hold image hashes for a particular image hashing algorithm
        Parameters:
        hasher - the hashing algorithm
        Throws:
        SQLException - if an SQL error occurs
      • doesTableExist

        protected boolean doesTableExist​(String tableName)
                                  throws SQLException
        Query if the database contains a table with the given name
        Parameters:
        tableName - The table name to check for
        Returns:
        true if a table with the name exists, false otherwise
        Throws:
        SQLException - if an SQLError occurs
      • resolveTableName

        protected String resolveTableName​(HashingAlgorithm hashAlgo)
        Map a hashing algorithm to a table name
        Parameters:
        hashAlgo - The hashing algorithm
        Returns:
        the table name to identify the table used to save hashes produced by this algorithm into
      • reconstructHashFromDatabase

        protected Hash reconstructHashFromDatabase​(HashingAlgorithm hasher,
                                                   byte[] bytes)
        Reconstruct a hash value from the database
        Parameters:
        hasher - The hashing algorithm used to create the hash
        bytes - the byte array stored in the database
        Returns:
        a hash value which tests .equals == true to the hash object saved in the database
        Since:
        2.0.2
      • doesEntryExist

        public boolean doesEntryExist​(String uniqueId,
                                      HashingAlgorithm hashAlgo)
                               throws SQLException
        Check if an entry with the given uniqueId already exists
        Parameters:
        uniqueId - the unique id to check against
        hashAlgo - the hashing algorithm
        Returns:
        true if the entry does not exist. false otherwise
        Throws:
        SQLException - if an SQL error occurs
        Since:
        2.1.0