The uzu inference engine makes it possible to run LLMs, enabling tasks like text generation, information retrieval, classification and summarization. In this guide, we’ll walk through the full process of integrating an LLM into your app.

Components

Let’s start with a closer look at the entities we’ll be working with later on:
The main entry point to the SDK. Use it to refresh the registry, download models, and create inference sessions.
A list of available models for your device. Use it to pick the model you want to run.
A prebuilt configuration for an inference session. Choosing the right preset for your task can give you significant performance improvements, as discussed here.
The main entity for interacting with the model. It keeps the selected model’s weights in memory and lets you send requests to the LLM.

Interaction diagram

Integration

The full integration process consists of the following steps:
1

Get an API key

2

Connect and configure the SDK

3

Choose and download a model

4

Choose an inference session configuration

5

Create an inference session and run a model

Let’s take a detailed look at each step