SessionConfig
type, which has the following properties:
SessionConfig
preset | See below. |
samplingSeed | At each step of model inference, a new token is chosen from the distribution using a sampling method, which is usually stochastic. If you need reproducible results for testing, you can set a fixed samplingSeed. |
contextLength | Each model has its own context length, which defines how long your conversation with the model can be. If you know you won’t need a long context, you can set a smaller value to save RAM during inference. |
SessionPreset
Basically, the preset defines the method of speculative decoding. When an LLM generates text, it goes step by step, predicting one token at a time. This is autoregressive decoding. It’s slow because each token needs a full forward pass. With speculative decoding, we use heuristics to guess multiple tokens in a single model run, then validate them.
While you can use any input with any
SessionPreset
, a wrong choice can lead to performance degradation. If the heuristic doesn’t match your real use case, you’ll end up doing extra computations without gaining any performance boost, since we won’t be able to predict any additional tokens.If you’re using a thinking model, it’s better to go with the
General
preset, since other presets won’t give any boost during the thinking phase.No speculation is performed.