Package chat.octet.model.beans
Class LlamaModelQuantizeParams
java.lang.Object
chat.octet.model.beans.LlamaModelQuantizeParams
Llama model quantize params entity
- Author:
- William
-
Field Summary
FieldsModifier and TypeFieldDescriptionbooleanallow quantizing non-f32/f16 tensorsbooleanquantize to the same number of shards.intquantize to this llama_ftypebooleanonly copy tensors - ftype, allow_requantize and quantize_output_tensor are ignoredintoutput tensor type.booleandisable k-quant mixtures and quantize all tensors to the same typebooleanquantize output.weightintnumber of threads to use for quantizing, if invalid input: '<'=0 will use std::thread::hardware_concurrency()inttoken embeddings tensor type. -
Constructor Summary
Constructors -
Method Summary
-
Field Details
-
thread
public int threadnumber of threads to use for quantizing, if invalid input: '<'=0 will use std::thread::hardware_concurrency() -
modelFileType
public int modelFileTypequantize to this llama_ftype -
outputTensorType
public int outputTensorTypeoutput tensor type. -
tokenEmbeddingType
public int tokenEmbeddingTypetoken embeddings tensor type. -
allowRequantize
public boolean allowRequantizeallow quantizing non-f32/f16 tensors -
quantizeOutputTensor
public boolean quantizeOutputTensorquantize output.weight -
onlyCopy
public boolean onlyCopyonly copy tensors - ftype, allow_requantize and quantize_output_tensor are ignored -
pure
public boolean puredisable k-quant mixtures and quantize all tensors to the same type -
keepSplit
public boolean keepSplitquantize to the same number of shards.
-
-
Constructor Details
-
LlamaModelQuantizeParams
public LlamaModelQuantizeParams()
-