Class BinarySearcherOverBlockSource

java.lang.Object
org.aksw.commons.io.hadoop.binseach.v2.BinarySearcherOverBlockSource
All Implemented Interfaces:
AutoCloseable, org.aksw.commons.io.binseach.BinarySearcher

public class BinarySearcherOverBlockSource extends Object implements org.aksw.commons.io.binseach.BinarySearcher
Binary search implementation that finds lines matching a prefix in a 'block' source such as a bzip2 compressed file.
  • Field Details

    • blockSource

      protected BlockSource blockSource
    • cacheSupplier

      protected Supplier<org.aksw.commons.io.hadoop.binseach.v2.BinSearchResourceCache.CacheEntry> cacheSupplier
  • Constructor Details

    • BinarySearcherOverBlockSource

      public BinarySearcherOverBlockSource(BlockSource blockSource, Supplier<org.aksw.commons.io.hadoop.binseach.v2.BinSearchResourceCache.CacheEntry> cacheSupplier)
  • Method Details

    • close

      public void close() throws Exception
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface org.aksw.commons.io.binseach.BinarySearcher
      Throws:
      Exception
    • search

      public InputStream search(byte[] prefix) throws IOException
      Specified by:
      search in interface org.aksw.commons.io.binseach.BinarySearcher
      Throws:
      IOException
    • parallelSearch

      public Stream<org.aksw.commons.io.input.ReadableChannelSupplier<byte[]>> parallelSearch(byte[] prefix) throws IOException
      Specified by:
      parallelSearch in interface org.aksw.commons.io.binseach.BinarySearcher
      Throws:
      IOException
    • binarySearch

      public static Match binarySearch(BlockSource blockSource, byte[] prefix, BinSearchLevelCache cache) throws IOException
      Throws:
      IOException
    • adjustStart

      public static long adjustStart(BlockSource blockSource, long start, int depth, BinSearchLevelCache cache) throws IOException
      Adjust a position to the next block. It must hold that this function is idempotent: adjustStart(adjustStart(offset)) = adjustStart(offset)
      Throws:
      IOException
    • binarySearch

      public static Match binarySearch(BlockSource blockSource, SearchMode searchMode, int depth, long start, long startAfter, long end, byte delimiter, byte[] prefix, BinSearchLevelCache cache) throws IOException
      When this method returns the input stream's position is unspecified.

      Note on cache semantics:

      1. Disposition is the mapping from the current offset to that of the next block (NOT the record).
      2. The header map maps the block id to the first record
      Parameters:
      in - The seekable input stream on which to perform binary search for the given prefix.
      searchMode - Whether we are searching the initial match, or the start or end of a run of matches.
      start -
      startAfter - The adjusted start for the offset (start + 1)
      end -
      delimiter -
      prefix -
      Returns:
      Throws:
      IOException