Package io.lakefs

Class LakeFSFileSystem

  • All Implemented Interfaces:
    Closeable, AutoCloseable, org.apache.hadoop.conf.Configurable

    public class LakeFSFileSystem
    extends org.apache.hadoop.fs.FileSystem
    A dummy implementation of the core lakeFS Filesystem. This class implements a LakeFSFileSystem that can be registered to Spark and support limited write and read actions.

    Configure Spark to use lakeFS filesystem by property: spark.hadoop.fs.lakefs.impl=io.lakefs.LakeFSFileSystem.

    Configure the application or the filesystem application by properties: fs.lakefs.endpoint=http://localhost:8000/api/v1 fs.lakefs.access.key=AKIAIOSFODNN7EXAMPLE fs.lakefs.secret.key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.apache.hadoop.fs.FileSystem

        org.apache.hadoop.fs.FileSystem.Statistics
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static org.slf4j.Logger LOG  
      static org.slf4j.Logger OPERATIONS_LOG  
      • Fields inherited from class org.apache.hadoop.fs.FileSystem

        DEFAULT_FS, FS_DEFAULT_NAME_KEY, SHUTDOWN_HOOK_PRIORITY, statistics
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      org.apache.hadoop.fs.FSDataOutputStream append​(org.apache.hadoop.fs.Path path, int i, org.apache.hadoop.util.Progressable progressable)  
      org.apache.hadoop.fs.FSDataOutputStream create​(org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.permission.FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, org.apache.hadoop.util.Progressable progress)
      Called on a file write Spark/Hadoop action.
      boolean delete​(org.apache.hadoop.fs.Path path, boolean recursive)  
      boolean exists​(org.apache.hadoop.fs.Path path)  
      long getDefaultBlockSize()  
      long getDefaultBlockSize​(org.apache.hadoop.fs.Path path)  
      LakeFSFileStatus getFileStatus​(org.apache.hadoop.fs.Path path)
      Return a file status object that represents the path.
      String getScheme()
      Return the protocol scheme for the FileSystem.
      URI getUri()  
      org.apache.hadoop.fs.Path getWorkingDirectory()  
      org.apache.hadoop.fs.FileStatus[] globStatus​(org.apache.hadoop.fs.Path pathPattern)  
      void initialize​(URI name, org.apache.hadoop.conf.Configuration conf)  
      org.apache.hadoop.fs.RemoteIterator<org.apache.hadoop.fs.LocatedFileStatus> listFiles​(org.apache.hadoop.fs.Path f, boolean recursive)  
      org.apache.hadoop.fs.FileStatus[] listStatus​(org.apache.hadoop.fs.Path path)  
      boolean mkdirs​(org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.permission.FsPermission fsPermission)
      Make the given path and all non-existent parents into directories.
      org.apache.hadoop.fs.FSDataInputStream open​(org.apache.hadoop.fs.Path path, int bufSize)  
      io.lakefs.ObjectLocation pathToObjectLocation​(org.apache.hadoop.fs.Path path)
      Returns Location with repository, ref and path used by lakeFS based on filesystem path.
      boolean rename​(org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.Path dst)
      Rename, behaving similarly to the POSIX "mv" command, but non-atomically.
      void setWorkingDirectory​(org.apache.hadoop.fs.Path path)  
      static org.apache.hadoop.fs.RemoteIterator<org.apache.hadoop.fs.LocatedFileStatus> toLocatedFileStatusIterator​(org.apache.hadoop.fs.RemoteIterator<? extends org.apache.hadoop.fs.LocatedFileStatus> iterator)  
      protected <R> R withFileSystemAndTranslatedPhysicalPath​(String physicalAddress, io.lakefs.LakeFSFileSystem.BiFunctionWithIOException<org.apache.hadoop.fs.FileSystem,​org.apache.hadoop.fs.Path,​R> f)  
      • Methods inherited from class org.apache.hadoop.fs.FileSystem

        access, addDelegationTokens, append, append, areSymlinksEnabled, cancelDeleteOnExit, canonicalizeUri, checkPath, clearStatistics, close, closeAll, closeAllForUGI, completeLocalOutput, concat, copyFromLocalFile, copyFromLocalFile, copyFromLocalFile, copyFromLocalFile, copyToLocalFile, copyToLocalFile, copyToLocalFile, create, create, create, create, create, create, create, create, create, create, create, create, createNewFile, createNonRecursive, createNonRecursive, createNonRecursive, createSnapshot, createSnapshot, createSymlink, delete, deleteOnExit, deleteSnapshot, enableSymlinks, fixRelativePart, get, get, get, getAclStatus, getAllStatistics, getBlockSize, getCanonicalServiceName, getCanonicalUri, getChildFileSystems, getContentSummary, getDefaultPort, getDefaultReplication, getDefaultReplication, getDefaultUri, getDelegationToken, getFileBlockLocations, getFileBlockLocations, getFileChecksum, getFileChecksum, getFileLinkStatus, getFileSystemClass, getFSofPath, getHomeDirectory, getInitialWorkingDirectory, getLength, getLinkTarget, getLocal, getName, getNamed, getReplication, getServerDefaults, getServerDefaults, getStatistics, getStatistics, getStatus, getStatus, getUsed, getXAttr, getXAttrs, getXAttrs, globStatus, isDirectory, isFile, listCorruptFileBlocks, listLocatedStatus, listLocatedStatus, listStatus, listStatus, listStatus, listStatusIterator, listXAttrs, makeQualified, mkdirs, mkdirs, modifyAclEntries, moveFromLocalFile, moveFromLocalFile, moveToLocalFile, newInstance, newInstance, newInstance, newInstanceLocal, open, primitiveCreate, primitiveMkdir, primitiveMkdir, printStatistics, processDeleteOnExit, removeAcl, removeAclEntries, removeDefaultAcl, removeXAttr, rename, renameSnapshot, resolveLink, resolvePath, setAcl, setDefaultUri, setDefaultUri, setOwner, setPermission, setReplication, setTimes, setVerifyChecksum, setWriteChecksum, setXAttr, setXAttr, startLocalOutput, supportsSymlinks, truncate
      • Methods inherited from class org.apache.hadoop.conf.Configured

        getConf, setConf
    • Field Detail

      • LOG

        public static final org.slf4j.Logger LOG
      • OPERATIONS_LOG

        public static final org.slf4j.Logger OPERATIONS_LOG
    • Constructor Detail

      • LakeFSFileSystem

        public LakeFSFileSystem()
    • Method Detail

      • getUri

        public URI getUri()
        Specified by:
        getUri in class org.apache.hadoop.fs.FileSystem
      • initialize

        public void initialize​(URI name,
                               org.apache.hadoop.conf.Configuration conf)
                        throws IOException
        Overrides:
        initialize in class org.apache.hadoop.fs.FileSystem
        Throws:
        IOException
      • withFileSystemAndTranslatedPhysicalPath

        protected <R> R withFileSystemAndTranslatedPhysicalPath​(String physicalAddress,
                                                                io.lakefs.LakeFSFileSystem.BiFunctionWithIOException<org.apache.hadoop.fs.FileSystem,​org.apache.hadoop.fs.Path,​R> f)
                                                         throws URISyntaxException,
                                                                IOException
        Returns:
        FileSystem suitable for the translated physical address
        Throws:
        URISyntaxException
        IOException
      • getDefaultBlockSize

        public long getDefaultBlockSize​(org.apache.hadoop.fs.Path path)
        Overrides:
        getDefaultBlockSize in class org.apache.hadoop.fs.FileSystem
      • getDefaultBlockSize

        public long getDefaultBlockSize()
        Overrides:
        getDefaultBlockSize in class org.apache.hadoop.fs.FileSystem
      • open

        public org.apache.hadoop.fs.FSDataInputStream open​(org.apache.hadoop.fs.Path path,
                                                           int bufSize)
                                                    throws IOException
        Specified by:
        open in class org.apache.hadoop.fs.FileSystem
        Throws:
        IOException
      • create

        public org.apache.hadoop.fs.FSDataOutputStream create​(org.apache.hadoop.fs.Path path,
                                                              org.apache.hadoop.fs.permission.FsPermission permission,
                                                              boolean overwrite,
                                                              int bufferSize,
                                                              short replication,
                                                              long blockSize,
                                                              org.apache.hadoop.util.Progressable progress)
                                                       throws IOException
        Called on a file write Spark/Hadoop action. This method writes the content of the file in path into stdout.
        Specified by:
        create in class org.apache.hadoop.fs.FileSystem
        Throws:
        IOException
      • append

        public org.apache.hadoop.fs.FSDataOutputStream append​(org.apache.hadoop.fs.Path path,
                                                              int i,
                                                              org.apache.hadoop.util.Progressable progressable)
                                                       throws IOException
        Specified by:
        append in class org.apache.hadoop.fs.FileSystem
        Throws:
        IOException
      • rename

        public boolean rename​(org.apache.hadoop.fs.Path src,
                              org.apache.hadoop.fs.Path dst)
                       throws IOException
        Rename, behaving similarly to the POSIX "mv" command, but non-atomically. 1. Rename is only supported for uncommitted data on the same branch. 2. The following rename scenarios are supported: * file -> existing-file-name: rename(src.txt, existing-dst.txt) -> existing-dst.txt, existing-dst.txt is overridden * file -> existing-directory-name: rename(src.txt, existing-dstdir) -> existing-dstdir/src.txt * file -> non-existing dst: in case of non-existing rename target, the src file is renamed to a file with the destination name. rename(src.txt, non-existing-dst) -> non-existing-dst, nonexisting-dst is a file. * directory -> non-existing directory: rename(srcDir(containing srcDir/a.txt), non-existing-dstdir) -> non-existing-dstdir/a.txt * directory -> existing directory: rename(srcDir(containing srcDir/a.txt), existing-dstdir) -> existing-dstdir/srcDir/a.txt 3. The rename dst path can be an uncommitted file, that will be overridden as a result of the rename operation. 4. The mtime of the src object is not preserved.
        Specified by:
        rename in class org.apache.hadoop.fs.FileSystem
        Throws:
        IOException
      • delete

        public boolean delete​(org.apache.hadoop.fs.Path path,
                              boolean recursive)
                       throws IOException
        Specified by:
        delete in class org.apache.hadoop.fs.FileSystem
        Throws:
        IOException
      • setWorkingDirectory

        public void setWorkingDirectory​(org.apache.hadoop.fs.Path path)
        Specified by:
        setWorkingDirectory in class org.apache.hadoop.fs.FileSystem
      • getWorkingDirectory

        public org.apache.hadoop.fs.Path getWorkingDirectory()
        Specified by:
        getWorkingDirectory in class org.apache.hadoop.fs.FileSystem
      • mkdirs

        public boolean mkdirs​(org.apache.hadoop.fs.Path path,
                              org.apache.hadoop.fs.permission.FsPermission fsPermission)
                       throws IOException
        Make the given path and all non-existent parents into directories. We use the same technic as S3A implementation, an object size 0, without a name with delimiter ('/') that keeps the directory exists. When we write an object into the directory - we can delete the marker.
        Specified by:
        mkdirs in class org.apache.hadoop.fs.FileSystem
        Parameters:
        path - path to create
        fsPermission - to apply (passing to the underlying filesystem)
        Returns:
        an IOException that corresponds to the translated API exception
        Throws:
        IOException
      • getFileStatus

        public LakeFSFileStatus getFileStatus​(org.apache.hadoop.fs.Path path)
                                       throws IOException
        Return a file status object that represents the path.
        Specified by:
        getFileStatus in class org.apache.hadoop.fs.FileSystem
        Parameters:
        path - to a file or directory
        Returns:
        a LakeFSFileStatus object
        Throws:
        FileNotFoundException - when the path does not exist; IOException API call or underlying filesystem exceptions
        IOException
      • getScheme

        public String getScheme()
        Return the protocol scheme for the FileSystem.
        Overrides:
        getScheme in class org.apache.hadoop.fs.FileSystem
        Returns:
        lakefs scheme
      • globStatus

        public org.apache.hadoop.fs.FileStatus[] globStatus​(org.apache.hadoop.fs.Path pathPattern)
                                                     throws IOException
        Overrides:
        globStatus in class org.apache.hadoop.fs.FileSystem
        Throws:
        IOException
      • exists

        public boolean exists​(org.apache.hadoop.fs.Path path)
                       throws IOException
        Overrides:
        exists in class org.apache.hadoop.fs.FileSystem
        Throws:
        IOException
      • pathToObjectLocation

        @Nonnull
        public io.lakefs.ObjectLocation pathToObjectLocation​(org.apache.hadoop.fs.Path path)
        Returns Location with repository, ref and path used by lakeFS based on filesystem path.
        Parameters:
        path - to extract information from.
        Returns:
        lakeFS Location with repository, ref and path
      • toLocatedFileStatusIterator

        public static org.apache.hadoop.fs.RemoteIterator<org.apache.hadoop.fs.LocatedFileStatus> toLocatedFileStatusIterator​(org.apache.hadoop.fs.RemoteIterator<? extends org.apache.hadoop.fs.LocatedFileStatus> iterator)