Class LlamaService

java.lang.Object
chat.octet.model.LlamaService

public class LlamaService extends Object
Llama.cpp API

C++ source: llamajava.h, llamajava.cpp

Since:
b3091 20240611
Author:
William
  • Constructor Details

    • LlamaService

      public LlamaService()
  • Method Details

    • initNative

      public static void initNative()
      initial JNI context.
    • getLlamaModelDefaultParams

      public static LlamaModelParams getLlamaModelDefaultParams()
      Get llama model default params.
      Returns:
      LlamaModelParams
    • getLlamaContextDefaultParams

      public static LlamaContextParams getLlamaContextDefaultParams()
      Get llama context default params.
      Returns:
      LlamaContextParams
    • getLlamaModelQuantizeDefaultParams

      public static LlamaModelQuantizeParams getLlamaModelQuantizeDefaultParams()
      Get llama model quantize default params.
      Returns:
      LlamaModelQuantizeParams
    • llamaBackendInit

      public static void llamaBackendInit()
      Initialize the llama + ggml backend.
    • llamaNumaInit

      public static void llamaNumaInit(int numaStrategy)
      Initialize NUMA optimizations.
    • llamaBackendFree

      public static void llamaBackendFree()
      Call once at the end of the program. NOTE: currently only used for MPI.
    • loadLlamaModelFromFile

      public static void loadLlamaModelFromFile(String modelPath, LlamaModelParams params) throws ModelException
      Load Llama model from file.
      Parameters:
      modelPath - Llama model file path.
      params - Llama model params.
      Throws:
      ModelException
      See Also:
    • createNewContextWithModel

      public static void createNewContextWithModel(LlamaContextParams params) throws ModelException
      Create new context with model.
      Parameters:
      params - Llama context params.
      Throws:
      ModelException
      See Also:
    • release

      public static void release()
      Close model and release all resources.
    • isMmapSupported

      public static boolean isMmapSupported()
      Check whether mmap is supported.
      Returns:
      boolean
    • isMlockSupported

      public static boolean isMlockSupported()
      Check whether mlock is supported.
      Returns:
      boolean
    • isGpuOffloadSupported

      public static boolean isGpuOffloadSupported()
      Check whether gpu_offload is supported.
      Returns:
      boolean
    • getVocabSize

      public static int getVocabSize()
      Get model vocab size.
      Returns:
      int
    • getContextSize

      public static int getContextSize()
      Get model context size.
      Returns:
      int
    • loadLoraModelFromFile

      public static int loadLoraModelFromFile(String loraPath, float loraScale, String baseModelPath, int threads) throws ModelException
      Apply a LoRA adapter to a loaded model path_base_model is the path to a higher quality model to use as a base for the layers modified by the adapter. Can be NULL to use the current loaded model. The model needs to be reloaded before applying a new adapter, otherwise the adapter will be applied on top of the previous one.
      Parameters:
      loraPath - LoRA adapter file path.
      loraScale - LoRA scale.
      baseModelPath - Base model file path.
      threads - Thread number.
      Returns:
      int, Returns 0 on success, else failed.
      Throws:
      ModelException
    • getLogits

      public static float[] getLogits(int index)
      Get Logits based on index, and the default index must be 0.
      Parameters:
      index - index
      Returns:
      float[], Returns one-dimensional float array.
    • getEmbedding

      public static float[] getEmbedding()
      Get embedding
      Returns:
      float[], Returns embedding float array.
    • getTokenAttr

      public static int getTokenAttr(int token)
      Get token type code.
      Parameters:
      token - Token id.
      Returns:
      int
    • getTokenBOS

      public static int getTokenBOS()
      Get special BOS token.
      Returns:
      int, Returns token id.
    • getTokenEOS

      public static int getTokenEOS()
      Get special EOS token.
      Returns:
      int, Returns token id.
    • tokenize

      public static int tokenize(byte[] buf, int bufferLength, int[] tokens, int maxTokens, boolean addBos, boolean specialTokens)
      Convert the provided text into tokens. The tokens pointer must be large enough to hold the resulting tokens. Returns the number of tokens on success, no more than n_max_tokens.
      Parameters:
      buf - Text byte buffer.
      bufferLength - Text byte buffer length.
      tokens - Empty token arrays, Used to receive the returned tokens.
      maxTokens - Max token size, by default is context size.
      addBos - Add special BOS token.
      specialTokens - Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext. Does not insert a leading space.
      Returns:
      int, Returns a negative number on failure, else the number of tokens that would have been returned.
    • tokenToPiece

      public static int tokenToPiece(int token, byte[] buf, int bufferLength, boolean special)
      Convert the token id to text piece.
      Parameters:
      token - Token id.
      buf - Input byte buffer.
      bufferLength - Input byte buffer length.
      special - If true, special tokens are rendered in the output.
      Returns:
      int, Returns byte buffer length of the piece.
    • getSamplingMetrics

      public static Metrics getSamplingMetrics(boolean reset)
      Get sampling metrics
      Parameters:
      reset - Reset the counter when finished.
      Returns:
      Metrics
      See Also:
    • getSystemInfo

      public static String getSystemInfo()
      Get system parameter information.
      Returns:
      String
    • sampling

      public static int sampling(float[] logits, int[] lastTokens, int lastTokensSize, float penalty, float alphaFrequency, float alphaPresence, boolean penalizeNL, int mirostatMode, float mirostatTAU, float mirostatETA, float temperature, int topK, float topP, float tsf, float typical, float minP, float dynatempRange, float dynatempExponent, int sequenceId, int pastTokenSize) throws DecodeException
      Inference sampling the next token.
      Parameters:
      logits - User-defined logits, Adjustments can be made via LogitsProcessor.
      lastTokens - Last token array.
      lastTokensSize - Last token array size.
      penalty - Control the repetition of token sequences in the generated text.
      alphaFrequency - Repeat alpha frequency penalty.
      alphaPresence - Repeat alpha presence penalty.
      penalizeNL - Disable penalization for newline tokens when applying the repeat penalty.
      mirostatMode - Mirostat Sampling Use Mirostat sampling, controlling perplexity during text generation.
      mirostatTAU - Mirostat Sampling Set the Mirostat target entropy.
      mirostatETA - Mirostat Sampling Set the Mirostat learning rate.
      temperature - Adjust the randomness of the generated text.
      topK - TOP-K Sampling Limit the next token selection to the K most probable tokens.
      topP - TOP-P Sampling Limit the next token selection to a subset of tokens with a cumulative probability above a threshold P.
      tsf - Tail Free Sampling (TFS) Enable tail free sampling with parameter z.
      typical - Typical Sampling Enable typical sampling sampling with parameter p.
      minP - MIN-P Sampling Sets a minimum base probability threshold for token selection.
      dynatempRange - Dynamic Temperature Sampling Dynamic temperature range.
      dynatempExponent - Dynamic Temperature Sampling Dynamic temperature exponent.
      sequenceId - Generation sequence id.
      pastTokenSize - Past token size.
      Returns:
      int, Returns the sampled token id.
      Throws:
      DecodeException
      See Also:
    • sampling

      public static int sampling(GenerateParameter generateParams, float[] logits, int[] lastTokens, int sequenceId, int pastTokenSize) throws DecodeException
      Inference sampling the next token.
      Parameters:
      generateParams - generation parameter.
      logits - User-defined logits, Adjustments can be made via LogitsProcessor.
      lastTokens - Last token array.
      sequenceId - Generation sequence id.
      pastTokenSize - Past token size.
      Returns:
      int, Returns the sampled token id.
      Throws:
      DecodeException
      See Also:
    • loadLlamaGrammar

      public static boolean loadLlamaGrammar(String grammarRules)
      Load llama grammar by rules.
      Parameters:
      grammarRules - Grammar rules.
      Returns:
      boolean, Returns true on success, else failed.
    • batchDecode

      public static int batchDecode(int sequenceId, int[] tokens, int inputLength, int pastTokenSize)
      Batch decoding.
      Parameters:
      sequenceId - Specify a unique generation sequence id.
      tokens - Arrays of tokens that need to be decoding.
      inputLength - Input context length.
      pastTokenSize - Past token size.
      Returns:
      int, Returns 0 on success, else failed.
    • clearCache

      public static void clearCache(int sequenceId, int posStart, int posEnd)
      Clear cache in K-V sequences.
      Parameters:
      sequenceId - Generation sequence id.
      posStart - Start position.
      posEnd - End position.
    • clearCache

      public static void clearCache(int sequenceId)
      Clear cache in K-V sequences.
      Parameters:
      sequenceId - Generation sequence id.
    • llamaModelQuantize

      public static int llamaModelQuantize(String sourceModelFilePath, String outputModelFilePath, LlamaModelQuantizeParams params)
      Quantize the model.
      Parameters:
      sourceModelFilePath - Source model file path.
      outputModelFilePath - Output model file path.
      params - Quantize parameters.
      Returns:
      int, Returns 0 on success, else failed.
    • llamaModelQuantize

      public static int llamaModelQuantize(String sourceModelFilePath, String outputModelFilePath, ModelFileType modelFileType)
      Quantize the model.
      Parameters:
      sourceModelFilePath - Source model file path.
      outputModelFilePath - Output model file path.
      modelFileType - Model file type.
      Returns:
      int, Returns 0 on success, else failed.
      See Also:
    • tokenize

      public static int[] tokenize(String text, boolean addBos, boolean specialTokens)
      Convert the provided text into tokens.
      Parameters:
      text - Input text.
      addBos - Add special BOS token.
      specialTokens - Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext. Does not insert a leading space.
      Returns:
      Returns a negative number on failure, else the number of tokens that would have been returned.
    • getLlamaTokenAttr

      public static LlamaTokenAttr getLlamaTokenAttr(int token)
      Get token type define.
      Parameters:
      token - Token id.
      Returns:
      LlamaTokenAttr
      See Also:
    • llamaModelMeta

      public static String llamaModelMeta(String key)
      Retrieves the metadata information of the llama model based on the given key. This native method is used to obtain specific information about the model, which is identified by the key parameter. The information obtained may include model structure, parameters, version, etc.
      Parameters:
      key - The key used to identify the specific metadata information to retrieve.
      Returns:
      String, The metadata information corresponding to the key. Returns a String type, which may be a description of the model, the model's version number, or other information.