Launch DIAL Chat with a Self-Hosted Model
Introduction
In this tutorial, you will learn how to quickly launch DIAL Chat with a self-hosted model powered by vLLM.
Prerequisites
Docker engine installed on your machine (Docker Compose Version 2.20.0 +).
Refer to Docker documentation.
Step 1: Get DIAL
Clone the repository with the tutorials and change directory to the following folder:
cd dial-docker-compose/vllm
Step 2: Choose a model to run
vLLM supports a wide range of popular open-source models. We'll demonstrate how integrate Hugging Face chat model served by vLLM in the DIAL Platform.
Step 3: Launch DIAL Chat
-
Configure
.envfile in the current directory according to the type of model you've chosen:- Set
VLLM_CHAT_MODELfor the name of a chat model. A lightweight Qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int4 chat model from Hugging Face is a default.
- Set
-
Then run the following command to run vLLM server and key DIAL Platform components:
docker compose up --abort-on-container-exitKeep in mind that a typical size of a lightweight Hugging Face model is around a few gigabytes. So it may take a few minutes (or more) to download it on the first run, depending on your internet bandwidth and the size of the model you choose.
-
Finally, open http://localhost:3000/ in your browser to launch the DIAL Chat application and select an appropriate DIAL deployment to converse with
Self-hosted chat modeldeployment for theVLLM_CHAT_MODEL.