Class BufferOverInputStream
- java.lang.Object
-
- org.aksw.commons.io.hadoop.binseach.bz2.BufferOverInputStream
-
- All Implemented Interfaces:
AutoCloseable,org.aksw.commons.io.util.channel.ChannelFactory<org.aksw.commons.io.seekable.api.Seekable>
@ThreadSafe public class BufferOverInputStream extends Object implements org.aksw.commons.io.util.channel.ChannelFactory<org.aksw.commons.io.seekable.api.Seekable>
FIXME This class should be removed because it is superseded by BufferOverReadableChannel in aksw-commons-io! Implementation of a byte array that caches data in buckets from an InputStream. Instances of these class are thread safe, but the obtained channels are not; each channel should only be operated on by one thread. Differences to BufferedInputStream - this class caches all data read from the inputstream hence there is no mark / reset mechanism - buffer is split into buckets (no data copying required when allocating more space) - data is loaded on demand based on (possibly concurrent) requests to the seekable channels obtained with newChannel() Closest known-to-me Hadoop counterpart is BufferedFSInputStream (which is based on BufferedInputStream)- Author:
- raven
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classBufferOverInputStream.BucketPointerclassBufferOverInputStream.ByteArrayChannel
-
Field Summary
Fields Modifier and Type Field Description protected BufferOverInputStream.BucketPointeractiveEndEnd marker with two components (idx, pos) it is wrapped in an object to enable atomic replacement of the reference The pointer is monotonous in the sense that the end marker's logical linear location is only increased Reading an old version while a new one has been set will only cause a read to return on the old boundary, but a subsequent synchronized check for whether loading of additional data is needed is then made anywayprotected byte[][]bucketsThe buffered dataprotected InputStreamdataSupplierSupplier for additional dataprotected booleanisDataSupplierConsumedFlag to indicate that the dataSupplier has been consumed This is the case when dataSupplier(buffer) returns -1protected longknownDataSizeThe number of cached bytes.protected intmaxReadSizeMaximum number to read from the dataSupplier in one requestprotected intminReadSize
-
Constructor Summary
Constructors Constructor Description BufferOverInputStream(int initialBucketSize, InputStream dataSupplier)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()static BufferOverInputStreamcreate(InputStream in, int maxReadSize, int... preconfiguredBucketSizes)intdoRead(BufferOverInputStream.ByteArrayChannel reader, ByteBuffer dst)protected voidensureCapacityInActiveBucket()longgetKnownDataSize()static BufferOverInputStream.BucketPointergetPointer(byte[][] buckets, BufferOverInputStream.BucketPointer end, long pos)static longgetPosition(byte[][] buckets, int idx, int pos)booleanisDataSupplierConsumed()protected voidloadData(int needed)fetch a chunk from the input streamprotected voidloadDataUpTo(long requestedPos)Preload data up to including the requested position.static voidmain(String[] args)static voidmain2(String[] args)org.aksw.commons.io.seekable.api.SeekablenewChannel()protected intnextBucketSize()
-
-
-
Field Detail
-
buckets
protected byte[][] buckets
The buffered data
-
activeEnd
protected BufferOverInputStream.BucketPointer activeEnd
End marker with two components (idx, pos) it is wrapped in an object to enable atomic replacement of the reference The pointer is monotonous in the sense that the end marker's logical linear location is only increased Reading an old version while a new one has been set will only cause a read to return on the old boundary, but a subsequent synchronized check for whether loading of additional data is needed is then made anyway
-
knownDataSize
protected long knownDataSize
The number of cached bytes. Corresponds to the linear representation of activeEnd.
-
dataSupplier
protected InputStream dataSupplier
Supplier for additional data
-
minReadSize
protected int minReadSize
-
maxReadSize
protected int maxReadSize
Maximum number to read from the dataSupplier in one request
-
isDataSupplierConsumed
protected boolean isDataSupplierConsumed
Flag to indicate that the dataSupplier has been consumed This is the case when dataSupplier(buffer) returns -1
-
-
Constructor Detail
-
BufferOverInputStream
public BufferOverInputStream(int initialBucketSize, InputStream dataSupplier)
-
-
Method Detail
-
getKnownDataSize
public long getKnownDataSize()
-
isDataSupplierConsumed
public boolean isDataSupplierConsumed()
-
create
public static BufferOverInputStream create(InputStream in, int maxReadSize, int... preconfiguredBucketSizes)
- Parameters:
maxReadSize- Maximum number of bytes to request form the input stream at oncein-maxReadSize-preconfiguredBucketSizes-- Returns:
-
getPosition
public static long getPosition(byte[][] buckets, int idx, int pos)
-
getPointer
public static BufferOverInputStream.BucketPointer getPointer(byte[][] buckets, BufferOverInputStream.BucketPointer end, long pos)
- Parameters:
buckets-pos-- Returns:
- Pointer to a valid location in the know data block or null
-
newChannel
public org.aksw.commons.io.seekable.api.Seekable newChannel()
- Specified by:
newChannelin interfaceorg.aksw.commons.io.util.channel.ChannelFactory<org.aksw.commons.io.seekable.api.Seekable>
-
nextBucketSize
protected int nextBucketSize()
-
doRead
public int doRead(BufferOverInputStream.ByteArrayChannel reader, ByteBuffer dst)
-
loadDataUpTo
protected void loadDataUpTo(long requestedPos)
Preload data up to including the requested position. It is inclusive in order to allow for checking whether the requested position is in range.- Parameters:
requestedPos-
-
loadData
protected void loadData(int needed)
fetch a chunk from the input stream
-
ensureCapacityInActiveBucket
protected void ensureCapacityInActiveBucket()
-
close
public void close() throws Exception- Specified by:
closein interfaceAutoCloseable- Throws:
Exception
-
-