Describes the parameters for making an inference request.

InferenceParams

interface InferenceParams {
    extra?: Record<string, any>;
    grammar?: string;
    images?: string[];
    max_tokens?: number;
    min_p?: number;
    model?: ModelConf;
    repeat_penalty?: number;
    stop?: string[];
    stream?: boolean;
    temperature?: number;
    template?: string;
    tfs?: number;
    top_k?: number;
    top_p?: number;
}

Properties

extra?: Record<string, any>

Extra parameters to include in the payload

grammar?: string

The gnbf grammar to use for grammar-based sampling.

images?: string[]
max_tokens?: number

The number of predictions to return.

min_p?: number

The minimum probability for a token to be considered, relative to the probability of the most likely token.s

model?: ModelConf

The model configuration details for inference.

repeat_penalty?: number

Adjusts penalty for repeated tokens.

stop?: string[]

List of stop words or phrases to halt predictions.

stream?: boolean

Indicates if results should be streamed progressively.

temperature?: number

Adjusts randomness in sampling; higher values mean more randomness.

template?: string

The template to use, for the backends that support it.

tfs?: number

Set the tail free sampling value.

top_k?: number

Limits the result set to the top K results.

top_p?: number

Filters results based on cumulative probability.