Package io.lakefs
Class LakeFSFileSystem
- java.lang.Object
-
- org.apache.hadoop.conf.Configured
-
- org.apache.hadoop.fs.FileSystem
-
- io.lakefs.LakeFSFileSystem
-
- All Implemented Interfaces:
Closeable,AutoCloseable,org.apache.hadoop.conf.Configurable
public class LakeFSFileSystem extends org.apache.hadoop.fs.FileSystemA dummy implementation of the core lakeFS Filesystem. This class implements aLakeFSFileSystemthat can be registered to Spark and support limited write and read actions.Configure Spark to use lakeFS filesystem by property: spark.hadoop.fs.lakefs.impl=io.lakefs.LakeFSFileSystem.
Configure the application or the filesystem application by properties: fs.lakefs.endpoint=http://localhost:8000/api/v1 fs.lakefs.access.key=AKIAIOSFODNN7EXAMPLE fs.lakefs.secret.key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
-
-
Field Summary
Fields Modifier and Type Field Description static org.slf4j.LoggerLOGstatic org.slf4j.LoggerOPERATIONS_LOG
-
Constructor Summary
Constructors Constructor Description LakeFSFileSystem()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.hadoop.fs.FSDataOutputStreamappend(org.apache.hadoop.fs.Path path, int i, org.apache.hadoop.util.Progressable progressable)org.apache.hadoop.fs.FSDataOutputStreamcreate(org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.permission.FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, org.apache.hadoop.util.Progressable progress)Called on a file write Spark/Hadoop action.booleandelete(org.apache.hadoop.fs.Path path, boolean recursive)booleanexists(org.apache.hadoop.fs.Path path)longgetDefaultBlockSize()longgetDefaultBlockSize(org.apache.hadoop.fs.Path path)LakeFSFileStatusgetFileStatus(org.apache.hadoop.fs.Path path)Return a file status object that represents the path.StringgetScheme()Return the protocol scheme for the FileSystem.URIgetUri()org.apache.hadoop.fs.PathgetWorkingDirectory()org.apache.hadoop.fs.FileStatus[]globStatus(org.apache.hadoop.fs.Path pathPattern)voidinitialize(URI name, org.apache.hadoop.conf.Configuration conf)org.apache.hadoop.fs.RemoteIterator<org.apache.hadoop.fs.LocatedFileStatus>listFiles(org.apache.hadoop.fs.Path f, boolean recursive)org.apache.hadoop.fs.FileStatus[]listStatus(org.apache.hadoop.fs.Path path)booleanmkdirs(org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.permission.FsPermission fsPermission)Make the given path and all non-existent parents into directories.org.apache.hadoop.fs.FSDataInputStreamopen(org.apache.hadoop.fs.Path path, int bufSize)io.lakefs.ObjectLocationpathToObjectLocation(org.apache.hadoop.fs.Path path)Returns Location with repository, ref and path used by lakeFS based on filesystem path.booleanrename(org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.Path dst)Rename, behaving similarly to the POSIX "mv" command, but non-atomically.voidsetWorkingDirectory(org.apache.hadoop.fs.Path path)static org.apache.hadoop.fs.RemoteIterator<org.apache.hadoop.fs.LocatedFileStatus>toLocatedFileStatusIterator(org.apache.hadoop.fs.RemoteIterator<? extends org.apache.hadoop.fs.LocatedFileStatus> iterator)protected <R> RwithFileSystemAndTranslatedPhysicalPath(String physicalAddress, io.lakefs.LakeFSFileSystem.BiFunctionWithIOException<org.apache.hadoop.fs.FileSystem,org.apache.hadoop.fs.Path,R> f)-
Methods inherited from class org.apache.hadoop.fs.FileSystem
access, addDelegationTokens, append, append, areSymlinksEnabled, cancelDeleteOnExit, canonicalizeUri, checkPath, clearStatistics, close, closeAll, closeAllForUGI, completeLocalOutput, concat, copyFromLocalFile, copyFromLocalFile, copyFromLocalFile, copyFromLocalFile, copyToLocalFile, copyToLocalFile, copyToLocalFile, create, create, create, create, create, create, create, create, create, create, create, create, createNewFile, createNonRecursive, createNonRecursive, createNonRecursive, createSnapshot, createSnapshot, createSymlink, delete, deleteOnExit, deleteSnapshot, enableSymlinks, fixRelativePart, get, get, get, getAclStatus, getAllStatistics, getBlockSize, getCanonicalServiceName, getCanonicalUri, getChildFileSystems, getContentSummary, getDefaultPort, getDefaultReplication, getDefaultReplication, getDefaultUri, getDelegationToken, getFileBlockLocations, getFileBlockLocations, getFileChecksum, getFileChecksum, getFileLinkStatus, getFileSystemClass, getFSofPath, getHomeDirectory, getInitialWorkingDirectory, getLength, getLinkTarget, getLocal, getName, getNamed, getReplication, getServerDefaults, getServerDefaults, getStatistics, getStatistics, getStatus, getStatus, getUsed, getXAttr, getXAttrs, getXAttrs, globStatus, isDirectory, isFile, listCorruptFileBlocks, listLocatedStatus, listLocatedStatus, listStatus, listStatus, listStatus, listStatusIterator, listXAttrs, makeQualified, mkdirs, mkdirs, modifyAclEntries, moveFromLocalFile, moveFromLocalFile, moveToLocalFile, newInstance, newInstance, newInstance, newInstanceLocal, open, primitiveCreate, primitiveMkdir, primitiveMkdir, printStatistics, processDeleteOnExit, removeAcl, removeAclEntries, removeDefaultAcl, removeXAttr, rename, renameSnapshot, resolveLink, resolvePath, setAcl, setDefaultUri, setDefaultUri, setOwner, setPermission, setReplication, setTimes, setVerifyChecksum, setWriteChecksum, setXAttr, setXAttr, startLocalOutput, supportsSymlinks, truncate
-
-
-
-
Method Detail
-
getUri
public URI getUri()
- Specified by:
getUriin classorg.apache.hadoop.fs.FileSystem
-
initialize
public void initialize(URI name, org.apache.hadoop.conf.Configuration conf) throws IOException
- Overrides:
initializein classorg.apache.hadoop.fs.FileSystem- Throws:
IOException
-
withFileSystemAndTranslatedPhysicalPath
protected <R> R withFileSystemAndTranslatedPhysicalPath(String physicalAddress, io.lakefs.LakeFSFileSystem.BiFunctionWithIOException<org.apache.hadoop.fs.FileSystem,org.apache.hadoop.fs.Path,R> f) throws URISyntaxException, IOException
- Returns:
- FileSystem suitable for the translated physical address
- Throws:
URISyntaxExceptionIOException
-
getDefaultBlockSize
public long getDefaultBlockSize(org.apache.hadoop.fs.Path path)
- Overrides:
getDefaultBlockSizein classorg.apache.hadoop.fs.FileSystem
-
getDefaultBlockSize
public long getDefaultBlockSize()
- Overrides:
getDefaultBlockSizein classorg.apache.hadoop.fs.FileSystem
-
open
public org.apache.hadoop.fs.FSDataInputStream open(org.apache.hadoop.fs.Path path, int bufSize) throws IOException- Specified by:
openin classorg.apache.hadoop.fs.FileSystem- Throws:
IOException
-
listFiles
public org.apache.hadoop.fs.RemoteIterator<org.apache.hadoop.fs.LocatedFileStatus> listFiles(org.apache.hadoop.fs.Path f, boolean recursive) throws FileNotFoundException, IOException- Overrides:
listFilesin classorg.apache.hadoop.fs.FileSystem- Throws:
FileNotFoundExceptionIOException
-
create
public org.apache.hadoop.fs.FSDataOutputStream create(org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.permission.FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, org.apache.hadoop.util.Progressable progress) throws IOExceptionCalled on a file write Spark/Hadoop action. This method writes the content of the file in path into stdout.- Specified by:
createin classorg.apache.hadoop.fs.FileSystem- Throws:
IOException
-
append
public org.apache.hadoop.fs.FSDataOutputStream append(org.apache.hadoop.fs.Path path, int i, org.apache.hadoop.util.Progressable progressable) throws IOException- Specified by:
appendin classorg.apache.hadoop.fs.FileSystem- Throws:
IOException
-
rename
public boolean rename(org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.Path dst) throws IOExceptionRename, behaving similarly to the POSIX "mv" command, but non-atomically. 1. Rename is only supported for uncommitted data on the same branch. 2. The following rename scenarios are supported: * file -> existing-file-name: rename(src.txt, existing-dst.txt) -> existing-dst.txt, existing-dst.txt is overridden * file -> existing-directory-name: rename(src.txt, existing-dstdir) -> existing-dstdir/src.txt * file -> non-existing dst: in case of non-existing rename target, the src file is renamed to a file with the destination name. rename(src.txt, non-existing-dst) -> non-existing-dst, nonexisting-dst is a file. * directory -> non-existing directory: rename(srcDir(containing srcDir/a.txt), non-existing-dstdir) -> non-existing-dstdir/a.txt * directory -> existing directory: rename(srcDir(containing srcDir/a.txt), existing-dstdir) -> existing-dstdir/srcDir/a.txt 3. The rename dst path can be an uncommitted file, that will be overridden as a result of the rename operation. 4. The mtime of the src object is not preserved.- Specified by:
renamein classorg.apache.hadoop.fs.FileSystem- Throws:
IOException
-
delete
public boolean delete(org.apache.hadoop.fs.Path path, boolean recursive) throws IOException- Specified by:
deletein classorg.apache.hadoop.fs.FileSystem- Throws:
IOException
-
listStatus
public org.apache.hadoop.fs.FileStatus[] listStatus(org.apache.hadoop.fs.Path path) throws FileNotFoundException, IOException- Specified by:
listStatusin classorg.apache.hadoop.fs.FileSystem- Throws:
FileNotFoundExceptionIOException
-
setWorkingDirectory
public void setWorkingDirectory(org.apache.hadoop.fs.Path path)
- Specified by:
setWorkingDirectoryin classorg.apache.hadoop.fs.FileSystem
-
getWorkingDirectory
public org.apache.hadoop.fs.Path getWorkingDirectory()
- Specified by:
getWorkingDirectoryin classorg.apache.hadoop.fs.FileSystem
-
mkdirs
public boolean mkdirs(org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.permission.FsPermission fsPermission) throws IOExceptionMake the given path and all non-existent parents into directories. We use the same technic as S3A implementation, an object size 0, without a name with delimiter ('/') that keeps the directory exists. When we write an object into the directory - we can delete the marker.- Specified by:
mkdirsin classorg.apache.hadoop.fs.FileSystem- Parameters:
path- path to createfsPermission- to apply (passing to the underlying filesystem)- Returns:
- an IOException that corresponds to the translated API exception
- Throws:
IOException
-
getFileStatus
public LakeFSFileStatus getFileStatus(org.apache.hadoop.fs.Path path) throws IOException
Return a file status object that represents the path.- Specified by:
getFileStatusin classorg.apache.hadoop.fs.FileSystem- Parameters:
path- to a file or directory- Returns:
- a LakeFSFileStatus object
- Throws:
FileNotFoundException- when the path does not exist; IOException API call or underlying filesystem exceptionsIOException
-
getScheme
public String getScheme()
Return the protocol scheme for the FileSystem.- Overrides:
getSchemein classorg.apache.hadoop.fs.FileSystem- Returns:
- lakefs scheme
-
globStatus
public org.apache.hadoop.fs.FileStatus[] globStatus(org.apache.hadoop.fs.Path pathPattern) throws IOException- Overrides:
globStatusin classorg.apache.hadoop.fs.FileSystem- Throws:
IOException
-
exists
public boolean exists(org.apache.hadoop.fs.Path path) throws IOException- Overrides:
existsin classorg.apache.hadoop.fs.FileSystem- Throws:
IOException
-
pathToObjectLocation
@Nonnull public io.lakefs.ObjectLocation pathToObjectLocation(org.apache.hadoop.fs.Path path)
Returns Location with repository, ref and path used by lakeFS based on filesystem path.- Parameters:
path- to extract information from.- Returns:
- lakeFS Location with repository, ref and path
-
toLocatedFileStatusIterator
public static org.apache.hadoop.fs.RemoteIterator<org.apache.hadoop.fs.LocatedFileStatus> toLocatedFileStatusIterator(org.apache.hadoop.fs.RemoteIterator<? extends org.apache.hadoop.fs.LocatedFileStatus> iterator)
-
-