Class BufferOverInputStream

  • All Implemented Interfaces:
    AutoCloseable, org.aksw.commons.io.util.channel.ChannelFactory<org.aksw.commons.io.seekable.api.Seekable>

    @ThreadSafe
    public class BufferOverInputStream
    extends Object
    implements org.aksw.commons.io.util.channel.ChannelFactory<org.aksw.commons.io.seekable.api.Seekable>
    FIXME This class should be removed because it is superseded by BufferOverReadableChannel in aksw-commons-io! Implementation of a byte array that caches data in buckets from an InputStream. Instances of these class are thread safe, but the obtained channels are not; each channel should only be operated on by one thread. Differences to BufferedInputStream - this class caches all data read from the inputstream hence there is no mark / reset mechanism - buffer is split into buckets (no data copying required when allocating more space) - data is loaded on demand based on (possibly concurrent) requests to the seekable channels obtained with newChannel() Closest known-to-me Hadoop counterpart is BufferedFSInputStream (which is based on BufferedInputStream)
    Author:
    raven
    • Field Detail

      • buckets

        protected byte[][] buckets
        The buffered data
      • activeEnd

        protected BufferOverInputStream.BucketPointer activeEnd
        End marker with two components (idx, pos) it is wrapped in an object to enable atomic replacement of the reference The pointer is monotonous in the sense that the end marker's logical linear location is only increased Reading an old version while a new one has been set will only cause a read to return on the old boundary, but a subsequent synchronized check for whether loading of additional data is needed is then made anyway
      • knownDataSize

        protected long knownDataSize
        The number of cached bytes. Corresponds to the linear representation of activeEnd.
      • dataSupplier

        protected InputStream dataSupplier
        Supplier for additional data
      • minReadSize

        protected int minReadSize
      • maxReadSize

        protected int maxReadSize
        Maximum number to read from the dataSupplier in one request
      • isDataSupplierConsumed

        protected boolean isDataSupplierConsumed
        Flag to indicate that the dataSupplier has been consumed This is the case when dataSupplier(buffer) returns -1
    • Constructor Detail

      • BufferOverInputStream

        public BufferOverInputStream​(int initialBucketSize,
                                     InputStream dataSupplier)
    • Method Detail

      • getKnownDataSize

        public long getKnownDataSize()
      • isDataSupplierConsumed

        public boolean isDataSupplierConsumed()
      • create

        public static BufferOverInputStream create​(InputStream in,
                                                   int maxReadSize,
                                                   int... preconfiguredBucketSizes)
        Parameters:
        maxReadSize - Maximum number of bytes to request form the input stream at once
        in -
        maxReadSize -
        preconfiguredBucketSizes -
        Returns:
      • getPosition

        public static long getPosition​(byte[][] buckets,
                                       int idx,
                                       int pos)
      • newChannel

        public org.aksw.commons.io.seekable.api.Seekable newChannel()
        Specified by:
        newChannel in interface org.aksw.commons.io.util.channel.ChannelFactory<org.aksw.commons.io.seekable.api.Seekable>
      • nextBucketSize

        protected int nextBucketSize()
      • loadDataUpTo

        protected void loadDataUpTo​(long requestedPos)
        Preload data up to including the requested position. It is inclusive in order to allow for checking whether the requested position is in range.
        Parameters:
        requestedPos -
      • loadData

        protected void loadData​(int needed)
        fetch a chunk from the input stream
      • ensureCapacityInActiveBucket

        protected void ensureCapacityInActiveBucket()