AI DIAL Components
AI DIAL Core
AI DIAL Core serves as the primary system component, acting as a main integration center, that employs a Unified Protocol (OpenAI compatible) for communication between internal and external clients, including all LLM models and applications to access all its features in a governed and unified manner.
The Unified Protocol supports:
- Streaming
- Token usage (even in the streaming mode)
- Seeds: helps to achieve deterministic results for LLM responses.
- Tools: (formerly known as functions ) are specialized utilities that streamline development by implementing standardized methods for LLMs to access external APIs.
- Multi-modality: allows supporting non-textual communications such as image-to-text, text-to-image, file transfers and more.
- Compatibility with OpenAI
This approach streamlines communication and fosters interoperability by eliminating the need for multiple protocols for each integration. In case of Addons, they are expected to provide own OpenAPI specification.
AI DIAL Core is headless and is the only mandatory component. It includes all the key platform features:
Authentication and Authorization
AI DIAL provides native support for OpenID Connect and OAuth2 and offers integration with various Identity Providers (IDP) such as Azure AD, Auth0, Okta, Microsoft Entra, Google OAuth2, and AWS Cognito where you can define user roles and attributes to support your custom permissions model. Additionally, you can leverage Keycloak to work with even wider range of IDPs.
There are two methods of CORE API calls authorization supported: JWT token and key. Both options provide granular permission management, allowing you to control access to specific functionalities or resources. Additionally, these authorization methods also enable rate and cost control, giving you the ability to manage the frequency of API calls.
Refer to Authentication to learn how to authenticate API keys and JWT and to Access & Cost Control to learn how to implement a custom role-based access policy.
Load Balancer
For self-hosted models, you can use the standard load balancer (LB) capabilities provided by the target cloud platform. As for cloud-deployed models like Azure OpenAI and others, we typically rely on our custom-developed load balancing solution.
In this approach, a configuration file includes multiple upstream endpoints for a model. When a request is received, it is forwarded to one of the endpoints using the round-robin method. If an upstream returns an overload limit error such as a 429 (Too Many Requests) or a 504 (Gateway Timeout), the system attempts another upstream and temporarily excludes the one that generated the error. This strategy ensures efficient load distribution and fault tolerance for optimal performance and reliability. Refer to the document with the overview of the performance tests to learn more.
Refer to Load Balancer to learn more.
Rate Limits & Cost Control
A well-distributed rate-limiting mechanism ensures the control over the total number of tokens that can be sent to a model (typically a one-minute or 24-hour window) by any Application, Addon, or Assistant.
- Refer to Roles & Access Control to learn how to configure limits for API keys and JWT.
Extension Framework
AI DIAL presents a robust Extension Framework and plug-in infrastructure, enabling seamless integration of your data and business workflows with Language Models (LLM) to enrich your enterprise applications.
You can use AI DIAL SDK to develop such extensions. Applications and model Adapters implemented using this framework will be compatible with AI DIAL API that was designed based on Azure OpenAI API.
Refer to AI DIAL SDK and Development Examples to learn more.
Addon: Addon is similar to a concept of tool or function in some other frameworks. Within the AI DIAL framework, an Addon is a service — or any component adhering to its own or provided OpenAPI specification — that empowers LLMs to access and utilize any desired data source or technology to produce their responses.
Application: any custom logic with a conversation (or any custom) interface packaged as a ready-to-use solution. It can be any component conforming with Unified Protocol requirements or registered custom endpoints. Refer to Applications to learn more about DIAL apps and their types.
The Assistant Service is used to enable communication between Addons and the AI DIAL Core. Assistants can range from simple implementations, like instructing the LLM to provide answers using a specific language tone or style, to more complex use cases, such as limiting the LLM's data scope to a particular geographical location. Refer to AI DIAL Assistant repository in GitHub.
Adapter: unifies APIs of respective LLMs to align with the Unified Protocol of AI DIAL Core.
Logging
AI DIAL Core uses Vector (a lightweight, ultra-fast tool for building observability pipelines) to redirect users’ messages to S3, Azure Blob Store, GCP Cloud Storage or any other "sink".
You can gather standard logs (which do not contain user messages) from components using the ELK stack (Elasticsearch, Logstash, Kibana) or other log collection system.
Refer to Observability to learn more.
Entitlements
In AI DIAL Core, user roles are defined and configured in the application config file. This allows administrators to specify which users or user groups are authorized to access specific resources or features within the application. These user roles match the once created in your IDP.
Refer to Authentication to learn about supported authentication methods.
Chat
Chat is a default AI DIAL UI which provides access to the full set of its features.
- Refer to Chat repository in GitHub to see the project source code.
- Refer to User Guide to learn about DIAL Chat features.
Overlay
UI Overlay allows adding Chat to a web application with zero effort by simply inserting a short HTML block.
Refer to Chat Overlay repository in GitHub to learn more.
<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
</head>
<body>
<script type="module">
import CHATAIOverlay from "./script.js";
const chatAiOverlay = new CHATAIOverlay("http://localhost:3000");
chatAiOverlay.load();
</script>
</body>
</html>
Themes
Chat Themes are used to customize the styles of Chat UI.
Refer to AI DIAL Chat Themes GitHub repository to learn more.
Persistent Layer
AI DIAL architecture includes a persistent layer, that relies on a resilient and scalable cloud blob storage (you can configure either AWS S3, Google Cloud Storage, Azure Blob Storage or a local file storage) where all conversations, prompts, custom applications and user files will be stored. Redis Cache (either cluster or a standalone) is deployed on top of it to enhance performance.
This architecture facilitates the swift retrieval of stored resources, supporting features such as sharing and publication of conversations and prompts.
Auth Helper
Auth Helper is used to resolve challenges (such as access control issues with the /userinfo
endpoint and retrieving user profile pictures) that may arise during integration with IDPs like Azure AD.
It is a proxy service that implements OpenID-compatible Web API endpoints to avoid direct interaction with such IDPs.
Refer to AI DIAL Auth Helper repository in GitHub to learn more.
Analytics Realtime
The AI DIAL Analytics Realtime tool uses diverse techniques such as embedding algorithms, clustering algorithms, frameworks, light-weight self-hosted language models, to analyze the conversation data and extract the needed results, which can be presented in tools such as Grafana for visualization.
Refer to Analytics Realtime GitHub repository.
Analytics Realtime tool is a sink of vector.dev
. It does not retain any private information, such as user prompts or conversations, beyond the system. Instead, only the computed artifacts are collected and stored in time-series databases like InfluxDB or any scalable database capable of handling voluminous, constantly changing information.
Examples of the computed artifacts:
- Who has used the AI? – user hash, title, and never personal data such as names.
- What areas have people asked questions about?
- Are there any recurring patterns?
- Topics of conversations.
- Unique users.
- Sentiments.
- Cost analysis of the communication.
- Language of conversations.
- Any other calculated statistics based on conversations.
Refer to Tutorials to learn more about configuration and usage of this service.
Extensions
Extensions such as Applications, Addons, Assistants and Adapters can be additionally developed and deployed to communicate with the AI DIAL Core via the Unified Protocol.
You can use AI DIAL SDK to develop such extensions. Applications and model Adapters implemented using this framework will be compatible with AI DIAL API that was designed based on the Azure OpenAI API.
Refer to AI DIAL SDK and Development Examples to learn more.
Extensions have freedom to employ a technology of their preference, be it any LLM framework, LlamaIndex, LangChain, Semantic Kernel, vector DBs or any other.
LLM Adapters
LLM Adapters unify the APIs of respective LLMs to align with the Unified Protocol of AI DIAL Core. Each Adapter operates within a dedicated container. Multi-modality allows supporting non-textual communications such as image-to-text, text-to-image, file transfers and more.
Refer to Azure OpenAI, GCP Vertex AI and AWS Bedrock AI DIAL repositories in GitHub.