Models

About Models

In Models, you can find all language models deployed in DIAL and also add new model deployments.

DIAL allows you to access models from all major LLM providers, language models from the open-source community, alternative vendors, and fine-tuned micro models, as well as self-hosted models or models listed on HuggingFace or DeepSeek.

DIAL can function as an agentic platform, where language models can be used as building blocks in your apps to create multi-modal and multi-agentic solutions.

You can use the DIAL SDK to create custom model adapters. Applications and model adapters implemented using this framework will be compatible with the DIAL API that was designed based on the Azure OpenAI API.

Refer to the Supported Models page for a list of all supported models and model adapters.

Models List

On the Models page, you can find all language models (LLMs) deployed on your DIAL instance. Here you can view, filter, and add new model definitions.

TIP: Click Columns to open the columns selector to define what columns to display.

Models grid

Field	Definition
Display Name	A user-friendly label for a model (e.g. "GPT-4 Turbo"). Display name is shown in all DIAL client UI dropdowns, tables, and logs so operators can quickly identify the model.
Version	An optional tag or a label for this model deployment (e.g. `0613`, `v1`). Use it to distinguish between "latest," "beta," or date-stamped builds.
Description	Free-text notes about this model’s purpose, training data, cost tier, or any other relevant details.
Deployment ID	This is a unique key under the `models` section of DIAL Core’s config. Must match the upstream service’s model or deployment name (e.g. `gpt-4-0613`).
Adapter	The identifier of the connector that handles requests for a model (OpenAI or DIAL). The adapter provides authentication, request formatting, and response parsing for the underlying LLM API. Refer to LLM Adapters to learn more.
Type	Defines Chat (conversational completions) and Embedding models (vector generation). DIAL Core uses this to choose the correct API endpoint and a payload schema.
Override Name	An optional, context-specific display label that supersedes Display Name in dropdowns or tables for certain routes or applications. Use it to give a model different aliases in different workflows without redefining the model.
Topics	Tags or categories (e.g. "finance," "support," "image-capable") you can assign for discovery, filtering, or grouping in large deployments. Helps end users and admins find the right model by the use case. Topics are also used to filter models in DIAL Marketplace.
Attachment types	Controls which types of attachments this model can accept.
Max attachment number	Maximum number of attachments allowed per single request. Leave blank for an unlimited number. Prevents requests with an excessive number of files.
Tokenizer model	Identifies the specific model with a tokenization algorithm identical to the referenced model's. This is typically the name of the earliest released model in a series of models sharing an identical tokenization algorithm. This parameter is essential for DIAL clients that reimplement tokenization algorithms on their side, instead of utilizing the tokenize Endpoint provided by the model.
Forward auth token	Optionally, configure the system to forward the Auth Token from the caller's session to the upstream API call. This enables multi-tenant scenarios or pass-through authentication for downstream services.
Interaction limit	The interaction limit parameter in models refers to the maximum number of tokens that can be transmitted in a completion request and response combined. This parameter ensures that the model does not exceed a specified token limit during interactions.
Prompt price	Cost per unit (according to Cost unit, typically "token" or "request") applied to the input portion of each call. Used by the Dashboard and Usage Logs to estimate spending in real time.
Completion price	The cost per unit is charged for the output portion of each call. Combined with the prompt price, it determines your per-model cost calculations.

Create Model

Click + Create to invoke a Create Model modal.

Define parameters:

Field	Required	Definition & Guidance
Deployment ID	Yes	A unique identifier used by the model adapter to invoke the model's backend.
Display Name	Yes	A user-friendly label shown across the UI (e.g. "GPT-4 Turbo").
Version	No	Version is an optional tag to track releases when you register multiple variants of the same model. (e.g. `2024-07-18`, `v1`)
Description	No	Free-text note about the model’s purpose or distinguishing traits.
Adapter	Yes	A model adapter that will handle requests to this model (e.g. OpenAI, DIAL). The chosen adapter supplies authentication, endpoint URL, and request formatting.

Click Create to close the dialog and open the configuration screen. When done with model configuration, click Save. It may take some time for the changes to take effect after saving. Once added, the model appears in the Models listing and can be used by Routes and Applications.

Model Configuration

You can access the model configuration screen by clicking any model in the models grid and also when adding a new model. In this section, you can view and configure all settings for the selected language model deployment.

Properties: Main definitions and runtime settings.
Features: Optional capabilities and custom endpoints.
Roles: User groups that can invoke this model and their rate limits.
Interceptors: Custom logic to modify requests or responses.
Dashboard: Real-time metrics and usage statistics.

Top Bar Controls

Delete: Permanently removes the selected model's definition from DIAL Core. All Routes referencing it will throw an error until a replacement is created.
JSON Editor (Toggle): Switch between the form-based UI and raw JSON view of the model’s configuration. Use JSON mode for copy-paste or advanced edits.

Properties

In the Properties tab, you can view and edit main definitions and runtime settings for model deployment.

Basic identification: Deployment ID, Display Name, Version, Description.
Adapter & Endpoint: Select the Adapter, API Type (Chat or Embedding), and read-only Endpoint URL.
Presentation & Attachments: Override name, icon, topics, and attachment types.
Upstream Configuration: Define upstream endpoints, authentication keys, weights, and extra data.
Advanced Options: Tokenizer model, forward auth token, interaction limits, retry attempts.
Cost Configuration: Set cost unit, prompt price, and completion price for real-time billing.

Basic Identification

Field	Required	Description
Deployment ID	Yes	A unique key DIAL Core uses in the `models` section. Must match the upstream’s deployment or model name (e.g. `gpt-4o`, `gpt-4-turbo`). Routes refer to this ID when selecting a model.
Display Name	Yes	User-friendly label shown in tables and dropdowns in DIAL clients (e.g. "GPT-4o"). Helps users identify and select models on UI.
Version	No	An optional version tag for tracking releases (e.g. `0613`, `v1`). Useful for A/B testing or canary rollouts.
Description	No	Free-text note describing the model’s purpose, fine-tune details, or its cost tier.

Adapter & Endpoint

Field	Required	Description
Adapter	Yes	An option to select a model adapter (connector)to handle requests to this model deployment (e.g. OpenAI, DIAL). Adapter defines how to authenticate, format payloads, and parse responses.
Type	Yes	A choice between Chat or Embedding API. Chat - for conversational chat completions. Embedding - for vector generation (semantic search, clustering).
Endpoint	Yes	Read-only URL that DIAL Core will invoke for this model/type. Auto-populated based on the model adapter and `deploymentId` when the model was created.

Presentation & Attachments

Field	Required	Description
Override Name	No	Custom display name for specific contexts.
Icon	No	A logo to visually distinguish models in the UI.
Topics	No	A tag that associates a model with one or more topics or categories (e.g. "finance", "support").
Attachments	No	An option to select the attachment types (images, files) this model can have. None – no attachments allowed. All – unrestricted types. Optionally specify max number of attachments. Custom – specific MIME types. Optionally specify max number of attachments.

Upstream Configuration

Field	Required	Description
Upstream Endpoints	Yes	One or more backend URLs to send requests to. Enables round-robin load balancing or fallback among multiple hosts. Refer to Load Balancer to learn more.
Keys	No	API key, token, or credential passed to the upstream. Stored securely and masked—click the eye icon to reveal.
Weight	Yes	Numeric weight for this endpoint in a multi-upstream scenario. Higher = more traffic share.
Tier	No	Specifies an endpoint group. In a regular scenario, all requests are routed to endpoints with the lowest tier, but in case of an outage or hitting the limits, the next one in the line helps to handle the load.
Extra Data	No	Free-form JSON or string metadata passed to the model adapter with each request.
+ Add Upstream	—	An option form registering additional endpoints if you need fail-over or capacity scaling.

Advanced Options

Field	Required	Description
Tokenizer Model	No	Identifies the specific model whose tokenization algorithm exactly matches that of the referenced model. This is typically the name of the earliest released model in a series of models sharing an identical tokenization algorithm. This parameter is essential for DIAL clients that reimplement tokenization algorithms on their side, instead of utilizing the tokenize endpoint provided by the model.
Forward auth token	No	Select a downstream auth token to forward from the user’s session (for downstream multi-tenant).
Interaction limit	No	This parameter ensures that the model does not exceed a specified token limit during interactions. Available values: None - DIAL does not apply any additional interaction limits beyond limits that your model enforces natively. Ideal for early prototyping or when you trust the LLM’s built-in safeguards. Total Number of Tokens - enforces a single, cumulative cap on the sum of all `prompt + completion` tokens across the entire chat. Separately Prompts and Completions - two independent limits: one on the sum of all input (prompt) tokens and another on the sum of all output (completion) tokens over the course of a conversation.
Max retry attempts	No	The number of times DIAL Core will retry a connection in case of upstream errors (e.g. on timeouts or 5xx responses).

Cost Configuration

Enables real-time cost estimation and quota enforcement. Powers the telemetry dashboard with per-model spending metrics.

Field	Required	Description
Cost unit	Yes	Base unit for billing. Available values: None - disables all cost tracking for this model. Tokens - every token sent or received by the model is counted towards your cost metrics. Char without whitespace - tells DIAL to count only non-whitespace characters (letters, numbers, punctuation) in each request as the billing unit.
Prompt price	Yes	Cost per unit for prompt tokens.
Completion price	Yes	Cost per unit for completion tokens (chat responses).

Features

In the Features tab, you can enable, disable, or override optional capabilities for a specific model. You can use model's features to tailor DIAL Core’s Unified Protocol behavior—turning features on when your model supports them, or off when it doesn’t.

TIPs: Enable only the features you need. Extra toggles can cause errors if upstream doesn’t support them. After setting a custom endpoint, test it via a simple API call to confirm accessibility and authentication.

Custom Feature Endpoints

Some models adapters expose specialized HTTP endpoints for tokenization, rate estimation, prompt truncation, or live configuration. You can override the default Unified Protocol calls by specifying them in this section.

Field	Description & When to Use
Rate endpoint	URL to invoke the model’s cost‐estimation or billing API. Call an endpoint that returns token counts & credit usage. Override if your adapter supports a dedicated "rate" path.
Tokenize endpoint	URL to invoke a standalone tokenization service. Use when you need precise token counts before truncation or batching. Models without built-in tokenization require this.
Truncate prompt endpoint	URL to invoke a prompt‐truncation API. Ensures prompts are safely cut to max context length. Useful when working with very long user inputs.
Configuration endpoint	URL to fetch model‐specific settings (e.g. max tokens, allowed parameters). Provide only for "configurable" deployments.

Feature Flags (Toggles)

Each toggle corresponds to a capability in the Unified Protocol. Enable them only if your model and adapter fully support that feature.

Toggle	What It Does
Temperature	Enables the `temperature` parameter in API calls. Controls randomness vs. determinism.
System prompt	Allows injecting a system‐level message (the "agent’s instructions") at the start of every chat. Disable for models that ignore or block system prompts.
Tools	Enables the `tools` (a.k.a. functions) feature for safe external API calls. Enable if you plan to use DIAL Add-ons or function calling.
Seed	Enables the `seed` parameter for deterministic output. Use in testing or reproducible workflows.
URL Attachments	Allows passing URLs as attachments (images, docs) to the model. Can be required for image-based or file-referencing prompts.
Folder Attachments	Enables attaching folders (batching multiple files).

Roles

You can create and manage roles in the Access Management section.

In the Roles tab, you can define user groups that are authorized to use a specific model and enforce per-role rate limits. This is essential for multi-tenant governance, quota enforcement, and cost control across teams or customers, preventing runaway costs by enforcing a hard ceiling.

Important: if roles are not specified for a specific model, the model will be available to all users.

Refer to Access & Cost Control to learn more about roles and rate limits in DIAL.

Roles grid

Column	Description & Guidance
Name	A unique role's identifier.
Description	A user-friendly explanation of the role’s purpose (e.g., "DIAL Prompt Engineering Team").
Tokens per minute	Per Minute tokens limit for a specific role. Blank = no limits. Inherits the default value. Can be overridden.
Tokens per day	Daily tokens limit for a specific role. Blank = no limits. Inherits the default value. Can be overridden.
Tokens per week	Weekly tokens limit for a specific role. Blank = no limits. Inherits the default value. Can be overridden.
Tokens per month	Monthly tokens limit for a specific role. Blank = no limits. Inherits the default value. Can be overridden.
Actions	Additional role-specific actions. When Make available to specific roles toggle is off - opens the Roles section in a new tab. When Make available to specific roles toggle is on, you can open the Roles section in a new tab, set no limits or remove the role from the list.

Set Rate Limits

The grid on the Roles screen lists the roles that can access a specific model. Here, you can also set individual limits for selected roles. For example, you can give "Admin" role unlimited monthly tokens but throttle "Developer" to 100,000 tokens/day or allow the "External Partner" role a small trial quota (e.g., 10,000 tokens/month) before upgrade.

To set or change rate limits for a role:

Click in the desired cell (e.g., Tokens per day for the "ADMIN").
Enter a numeric limit or leave blank to enable an unlimited access. Click Reset to default limits to restore default settings for all roles.
Click Save to apply changes.

Default Rate Limits

Default limits are set for all roles in the Roles grid by default; however you can override them as needed.

Field	Description
Default tokens per minute	The maximum tokens any user can consume per minute unless a specific limit is in place.
Default tokens per day	The maximum tokens any user can consume per day unless a specific limit is in place.
Default tokens per week	The maximum tokens any user can consume per week unless a specific limit is in place.
Default tokens per month	The maximum tokens any user may consume per month unless a specific limit is in place.

Role-Specific Access

Use Make available to specific roles toggle to define access to the model:

Off: Model is callable by any authenticated user. All existing user roles are in the grid.
On: Model is restricted - only the roles you explicitly add to the grid can invoke it.

Add

You can add a role only if Make available to specific roles toggle is On.

Click + Add (top-right of the Roles Grid).
Select one or more roles in the modal. The list or roles is defined in the Access Management section.
Confirm to add role(s) to the table.

Remove

You can remove a role only if Make available to specific roles toggle is On.

Click the actions menu in the role's line.
Choose Remove in the menu.

Interceptors

DIAL uses Interceptors to add custom logic to in/out requests for models and apps, enabling PII obfuscation, guardrails, safety checks, and beyond.

You can define Interceptors in the Builders → Interceptors section to add them to the processing pipeline of DIAL Core.

Refer to Interceptors to learn more.

Interceptors Grid

Column	Description
Order	Execution sequence. Interceptors run in ascending order (1 → 2 → 3...). A request will flow through each interceptor’s in this order.Response interceptors are invoked in the reversed order.
Name	The interceptor’s alias, matching the Name field in its definition.
Description	Free-text summary from the interceptor’s definition, explaining its purpose.
Actions	Additional role-specific actions. Open interceptor in a new tab. Remove the selected interceptor from the model's configuration.

Add

Click + Add (in the upper-right of the interceptors grid).
In the Add Interceptors modal, choose one or more from the grid of defined interceptors.
Apply to append them to the bottom of the list (are added in the same order as selected in the modal).

TIP: If you need a new interceptor, first create it under Builders → Interceptors and then revisit this tab to attach it to the model's configuration.

Reorder

Drag & Drop the handle (⋮⋮⋮⋮) to reassign the order in which interceptors are triggered.
Release to reposition; order renumbers automatically.
Save to lock-in the new execution sequence.

Remove

Click the actions menu in the interceptor's row.
Choose Remove to detach it from this model.
Save to lock-in the interceptors list.

Dashboard

TIP: You can monitor the entire system's metrics in Telemetry.

In the Dashboard tab, you can monitor real-time and historical metrics for the model. You can use it to monitor usage patterns, enforce SLAs, optimize costs, and troubleshoot anomalies.

Top Bar Controls

Control	What It Does
Time Period	An option allowing to select the date range for all charts and tables (e.g. last 15 min, 2 days, 7 days, 30 days).
+ Add filter	A filter with options to drill into a specific project.
Auto refresh	Set the dashboard to poll for new data (e.g. every 1 min) or turn off auto-refresh.

System Usage Chart

A time-series line chart of requests throughput over time. You can use it to monitor traffic peaks and valleys, correlate spikes with deployments or feature roll outs.

Key Metrics

Four high-level metrics are displayed alongside the chart. All calculated for the selected time period.

You can use them to:

Chargeback to internal teams or external customers by "Money".
Track adoption via "Unique Users".
Monitor burst traffic with "Request Count".
Watch token consumption to anticipate quota exhaustion.

Metric	Definition
Unique Users	Count of distinct user IDs or API keys that have called this model.
Request Count	Total number of chat or embedding calls routed to this model.
Total Tokens	Sum of `prompt + completion` tokens consumed by this model.
Money	Estimated spending on this model.

Projects Consumption Table

This table shows the KPIs breakdown by Project. You can use it to compare consumption across multiple projects.

Column	Description
Project	The project utilizing this model.
Request Count	Number of calls directed to the model.
Prompt tokens	Total tokens submitted in the prompt portion of requests.
Completion tokens	Total tokens returned by the model as responses.
Money	Estimated costs.

JSON Editor

For advanced scenarios of bulk updates, copy/paste between environments, or tweaking settings not exposed in the form UI—you can switch to the JSON Editor in any model’s configuration page.

Switching to the JSON Editor

Navigate to Entities → Models, then select the model you want to edit.
Click the JSON Editor toggle (top-right). The UI reveals the raw JSON.

TIP: You can switch between UI and JSON only if there are no unsaved changes.

Models

About Models​

Models List​

Models grid​

Create Model​

Model Configuration​

Top Bar Controls​

Properties​

Basic Identification​

Adapter & Endpoint​

Presentation & Attachments​

Upstream Configuration​

Advanced Options​

Cost Configuration​

Features​

Custom Feature Endpoints​

Feature Flags (Toggles)​

Roles​

Roles grid​

Set Rate Limits​

To set or change rate limits for a role:​

Default Rate Limits​

Role-Specific Access​

Add​

Remove​

Interceptors​

Interceptors Grid​

Add​

Reorder​

Remove​

Dashboard​

Top Bar Controls​

System Usage Chart​

Key Metrics​

Projects Consumption Table​

JSON Editor​

Switching to the JSON Editor​

About Models

Models List

Models grid

Create Model

Model Configuration

Top Bar Controls

Properties

Basic Identification

Adapter & Endpoint

Presentation & Attachments

Upstream Configuration

Advanced Options

Cost Configuration

Features

Custom Feature Endpoints

Feature Flags (Toggles)

Roles

Roles grid

Set Rate Limits

To set or change rate limits for a role:

Default Rate Limits

Role-Specific Access

Add

Remove

Interceptors

Interceptors Grid

Add

Reorder

Remove

Dashboard

Top Bar Controls

System Usage Chart

Key Metrics

Projects Consumption Table

JSON Editor

Switching to the JSON Editor