Class LlamaService

java.lang.Object
chat.octet.model.LlamaService

public class LlamaService extends Object
Llama.cpp API

C++ source: llamajava.h, llamajava.cpp

Since:
b1395
Author:
William
  • Constructor Details

    • LlamaService

      public LlamaService()
  • Method Details

    • initNative

      public static void initNative()
      initial JNI context.
    • getLlamaModelDefaultParams

      public static LlamaModelParams getLlamaModelDefaultParams()
      Get llama model default params.
      Returns:
      LlamaModelParams
    • getLlamaContextDefaultParams

      public static LlamaContextParams getLlamaContextDefaultParams()
      Get llama context default params.
      Returns:
      LlamaContextParams
    • llamaBackendInit

      public static void llamaBackendInit(boolean numa)
      Initialize the llama + ggml backend If numa is true, use NUMA optimizations Call once at the start of the program.
      Parameters:
      numa - Use NUMA optimizations.
    • llamaBackendFree

      public static void llamaBackendFree()
      Call once at the end of the program. NOTE: currently only used for MPI.
    • loadLlamaModelFromFile

      public static void loadLlamaModelFromFile(String modelPath, LlamaModelParams params) throws ModelException
      Load Llama model from file.
      Parameters:
      modelPath - Llama model file path.
      params - Llama model params.
      Throws:
      ModelException
      See Also:
    • createNewContextWithModel

      public static void createNewContextWithModel(LlamaContextParams params) throws ModelException
      Create new context with model.
      Parameters:
      params - Llama context params.
      Throws:
      ModelException
      See Also:
    • release

      public static void release()
      Close model and release all resources.
    • isMmapSupported

      public static boolean isMmapSupported()
      Check whether MMAP is supported.
      Returns:
      boolean
    • isMlockSupported

      public static boolean isMlockSupported()
      Check whether MLOCK is supported.
      Returns:
      boolean
    • getVocabSize

      public static int getVocabSize()
      Get model vocab size.
      Returns:
      int
    • getContextSize

      public static int getContextSize()
      Get model context size.
      Returns:
      int
    • loadLoraModelFromFile

      public static int loadLoraModelFromFile(String loraPath, float loraScale, String baseModelPath, int threads) throws ModelException
      Apply a LoRA adapter to a loaded model path_base_model is the path to a higher quality model to use as a base for the layers modified by the adapter. Can be NULL to use the current loaded model. The model needs to be reloaded before applying a new adapter, otherwise the adapter will be applied on top of the previous one.
      Parameters:
      loraPath - LoRA adapter file path.
      loraScale - LoRA scale.
      baseModelPath - Base model file path.
      threads - Thread number.
      Returns:
      int, Returns 0 on success, else failed.
      Throws:
      ModelException
    • getLogits

      public static float[] getLogits(int index)
      Get Logits based on index, and the default index must be 0.
      Parameters:
      index - index
      Returns:
      float[], Returns one-dimensional float array.
    • getEmbedding

      public static float[] getEmbedding()
      Get embedding
      Returns:
      float[], Returns embedding float array.
    • getTokenType

      public static int getTokenType(int token)
      Get token type code.
      Parameters:
      token - Token id.
      Returns:
      int
    • getTokenBOS

      public static int getTokenBOS()
      Get special BOS token.
      Returns:
      int, Returns token id.
    • getTokenEOS

      public static int getTokenEOS()
      Get special EOS token.
      Returns:
      int, Returns token id.
    • tokenize

      public static int tokenize(byte[] buf, int bufferLength, int[] tokens, int maxTokens, boolean addBos, boolean specialTokens)
      Convert the provided text into tokens. The tokens pointer must be large enough to hold the resulting tokens. Returns the number of tokens on success, no more than n_max_tokens.
      Parameters:
      buf - Text byte buffer.
      bufferLength - Text byte buffer length.
      tokens - Empty token arrays, Used to receive the returned tokens.
      maxTokens - Max token size, by default is context size.
      addBos - Add special BOS token.
      specialTokens - Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext. Does not insert a leading space.
      Returns:
      int, Returns a negative number on failure, else the number of tokens that would have been returned.
    • tokenToPiece

      public static int tokenToPiece(int token, byte[] buf, int bufferLength)
      Convert the token id to text piece.
      Parameters:
      token - Token id.
      buf - Input byte buffer.
      bufferLength - Input byte buffer length.
      Returns:
      int, Returns byte buffer length of the piece.
    • getSamplingMetrics

      public static Metrics getSamplingMetrics(boolean reset)
      Get sampling metrics
      Parameters:
      reset - Reset the counter when finished.
      Returns:
      Metrics
      See Also:
    • getSystemInfo

      public static String getSystemInfo()
      Get system parameter information.
      Returns:
      String
    • sampling

      public static int sampling(float[] logits, int[] lastTokens, int lastTokensSize, float penalty, float alphaFrequency, float alphaPresence, boolean penalizeNL, int mirostatMode, float mirostatTAU, float mirostatETA, float temperature, int topK, float topP, float tsf, float typical, int sequenceId, int pastTokenSize) throws DecodeException
      Inference sampling the next token.
      Parameters:
      logits - User-defined logits, Adjustments can be made via LogitsProcessor.
      lastTokens - last_tokens Last token array.
      lastTokensSize - last_tokens_size Last token array size.
      penalty - repeat-penalty Control the repetition of token sequences in the generated text.
      alphaFrequency - frequency-penalty Repeat alpha frequency penalty.
      alphaPresence - presence-penalty Repeat alpha presence penalty.
      penalizeNL - no-penalize-nl Disable penalization for newline tokens when applying the repeat penalty.
      mirostatMode - Mirostat Sampling Use Mirostat sampling, controlling perplexity during text generation.
      mirostatTAU - mirostat-ent Set the Mirostat target entropy.
      mirostatETA - mirostat-lr Set the Mirostat learning rate.
      temperature - temperature Adjust the randomness of the generated text.
      topK - TOP-K Sampling Limit the next token selection to the K most probable tokens.
      topP - TOP-P Sampling Limit the next token selection to a subset of tokens with a cumulative probability above a threshold P.
      tsf - Tail Free Sampling (TFS) Enable tail free sampling with parameter z.
      typical - Typical Sampling Enable typical sampling sampling with parameter p.
      sequenceId - Generation sequence id.
      pastTokenSize - Past token size.
      Returns:
      int, Returns the sampled token id.
      Throws:
      DecodeException
      See Also:
    • loadLlamaGrammar

      public static boolean loadLlamaGrammar(String grammarRules)
      Load llama grammar by rules.
      Parameters:
      grammarRules - Grammar rules.
      Returns:
      boolean, Returns true on success, else failed.
    • batchDecode

      public static int batchDecode(int sequenceId, int[] tokens, int inputLength, int pastTokenSize)
      Batch decoding.
      Parameters:
      sequenceId - Specify a unique generation sequence id.
      tokens - Arrays of tokens that need to be decoding.
      inputLength - Input context length.
      pastTokenSize - Past token size.
      Returns:
      int, Returns 0 on success, else failed.
    • clearCache

      public static void clearCache(int sequenceId, int posStart, int posEnd)
      Clear cache in K-V sequences.
      Parameters:
      sequenceId - Generation sequence id.
      posStart - Start position.
      posEnd - End position.
    • clearCache

      public static void clearCache(int sequenceId)
      Clear cache in K-V sequences.
      Parameters:
      sequenceId - Generation sequence id.
    • tokenize

      public static int[] tokenize(String text, boolean addBos, boolean specialTokens)
      Convert the provided text into tokens.
      Parameters:
      text - Input text.
      addBos - Add special BOS token.
      specialTokens - Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext. Does not insert a leading space.
      Returns:
      Returns a negative number on failure, else the number of tokens that would have been returned.
    • getLlamaTokenType

      public static LlamaTokenType getLlamaTokenType(int token)
      Get token type define.
      Parameters:
      token - Token id.
      Returns:
      LlamaTokenType
      See Also: