Package chat.octet.model.beans
Class LlamaContextParams
java.lang.Object
chat.octet.model.beans.LlamaContextParams
Llama context params entity
- Author:
- William
-
Field Summary
FieldsModifier and TypeFieldDescriptionintprompt processing batch size.inttext context size.intdata type for K cache.intdata type for V cache.floatdefragment the KV cache if holes/size > thold, invalid input: '<' 0 disabled (default).booleanembedding mode only.booleanwhether to use flash attention [EXPERIMENTAL].booleanthe llama_eval() call computes all logits, not just the last one.booleanwhether to offload the KQV ops (including the KV cache) to GPU.intwhether to pool (sum) embedding results by sequence id,(ignored if no pooling layer).floatRoPE base frequency.floatRoPE frequency scaling factor.intRoPE scaling type, from `enum llama_rope_scaling_type`.intRNG seed, -1 for random.intmax number of sequences (i.e.intnumber of threads used for generation.intnumber of threads used for prompt and batch processing.intphysical maximum batch size.floatYaRN magnitude scaling factor.floatYaRN low correction dim.floatYaRN high correction dim.floatYaRN extrapolation mix factor, NaN = from model.intYaRN original context size. -
Constructor Summary
Constructors -
Method Summary
-
Field Details
-
seed
public int seedRNG seed, -1 for random. -
ctx
public int ctxtext context size. -
batch
public int batchprompt processing batch size. -
ubatch
public int ubatchphysical maximum batch size. -
seqMax
public int seqMaxmax number of sequences (i.e. distinct states for recurrent models). -
threads
public int threadsnumber of threads used for generation. -
threadsBatch
public int threadsBatchnumber of threads used for prompt and batch processing. -
ropeScalingType
public int ropeScalingTypeRoPE scaling type, from `enum llama_rope_scaling_type`.- See Also:
-
poolingType
public int poolingTypewhether to pool (sum) embedding results by sequence id,(ignored if no pooling layer).- See Also:
-
yarnExtFactor
public float yarnExtFactorYaRN extrapolation mix factor, NaN = from model. -
yarnAttnFactor
public float yarnAttnFactorYaRN magnitude scaling factor. -
yarnBetaFast
public float yarnBetaFastYaRN low correction dim. -
yarnBetaSlow
public float yarnBetaSlowYaRN high correction dim. -
yarnOrigCtx
public int yarnOrigCtxYaRN original context size. -
defragThold
public float defragTholddefragment the KV cache if holes/size > thold, invalid input: '<' 0 disabled (default). -
ropeFreqBase
public float ropeFreqBaseRoPE base frequency. -
ropeFreqScale
public float ropeFreqScaleRoPE frequency scaling factor. -
dataTypeK
public int dataTypeKdata type for K cache. -
dataTypeV
public int dataTypeVdata type for V cache. -
logitsAll
public boolean logitsAllthe llama_eval() call computes all logits, not just the last one. -
embedding
public boolean embeddingembedding mode only. -
offloadKqv
public boolean offloadKqvwhether to offload the KQV ops (including the KV cache) to GPU. -
flashAttn
public boolean flashAttnwhether to use flash attention [EXPERIMENTAL].
-
-
Constructor Details
-
LlamaContextParams
public LlamaContextParams()
-