Launch DIAL Chat with a Self-Hosted Model

Introduction

In this tutorial, you will learn how to quickly launch DIAL Chat with a self-hosted model powered by vLLM.

Prerequisites

Docker engine installed on your machine (Docker Compose Version 2.20.0 +).

Refer to Docker documentation.

Step 1: Get DIAL

Clone the repository with the tutorials and change directory to the following folder:

cd dial-docker-compose/vllm

Step 2: Choose a model to run

vLLM supports a wide range of popular open-source models. We'll demonstrate how integrate Hugging Face chat model served by vLLM in the DIAL Platform.

Step 3: Launch DIAL Chat

Configure .env file in the current directory according to the type of model you've chosen:
- Set VLLM_CHAT_MODEL for the name of a chat model. A lightweight Qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int4 chat model from Hugging Face is a default.
Then run the following command to run vLLM server and key DIAL Platform components:
```
docker compose up --abort-on-container-exit
```
Keep in mind that a typical size of a lightweight Hugging Face model is around a few gigabytes. So it may take a few minutes (or more) to download it on the first run, depending on your internet bandwidth and the size of the model you choose.
Finally, open http://localhost:3000/ in your browser to launch the DIAL Chat application and select an appropriate DIAL deployment to converse with Self-hosted chat model deployment for the VLLM_CHAT_MODEL.

Introduction​

Prerequisites​

Step 1: Get DIAL​

Step 2: Choose a model to run​

Step 3: Launch DIAL Chat​

Introduction

Prerequisites

Step 1: Get DIAL

Step 2: Choose a model to run

Step 3: Launch DIAL Chat