Config type, which has the following properties:
Config
| preset | See below. |
| samplingSeed | At each step of model inference, a new token is chosen from the distribution using a sampling method, which is usually stochastic. If you need reproducible results for testing, you can set a fixed samplingSeed. |
| contextLength | Each model has its own context length, which defines how long your conversation with the model can be. If you know you won’t need a long context, you can set a smaller value to save RAM during inference. |
Preset
Basically, the preset defines the method of speculative decoding. When an LLM generates text, it goes step by step, predicting one token at a time. This is autoregressive decoding. It’s slow because each token needs a full forward pass. With speculative decoding, we use heuristics to guess multiple tokens in a single model run, then validate them.
- General
- Summarization
- Classification
No speculation is performed.