Class LlamaModelQuantizeParams

java.lang.Object
chat.octet.model.beans.LlamaModelQuantizeParams

public class LlamaModelQuantizeParams extends Object
Llama model quantize params entity
Author:
William
  • Field Details

    • thread

      public int thread
      number of threads to use for quantizing, if invalid input: '<'=0 will use std::thread::hardware_concurrency()
    • modelFileType

      public int modelFileType
      quantize to this llama_ftype
    • outputTensorType

      public int outputTensorType
      output tensor type.
    • tokenEmbeddingType

      public int tokenEmbeddingType
      token embeddings tensor type.
    • allowRequantize

      public boolean allowRequantize
      allow quantizing non-f32/f16 tensors
    • quantizeOutputTensor

      public boolean quantizeOutputTensor
      quantize output.weight
    • onlyCopy

      public boolean onlyCopy
      only copy tensors - ftype, allow_requantize and quantize_output_tensor are ignored
    • pure

      public boolean pure
      disable k-quant mixtures and quantize all tensors to the same type
    • keepSplit

      public boolean keepSplit
      quantize to the same number of shards.
  • Constructor Details

    • LlamaModelQuantizeParams

      public LlamaModelQuantizeParams()