Interceptors
Introduction
Refer to DIAL Interceptors Python SDK for a comprehensive information, configuration and implementation examples.
Interceptors can be seen as a middleware that modifies incoming or outgoing requests according to a specific logic. In DIAL, we use interceptors to facilitate the implementation of a so-called Responsible AI approach and enforce compliance with internal and external privacy regulations and policies.
For example, interceptors can block requests that violate specific regulations, related to restricted domains, or potentially lead to data leaks or biased responses. Another use case is when interceptors allow applications or models to respond solely to specific subjects and anonymize Personally Identifiable Information (PII) from user requests, or cache LLM responses.
Watch a demo video to learn more about interceptors.
Technically speaking, interceptors in DIAL are components inserted into deployments (applications or model adapters) that can be called before or after chat completion requests.
Interceptors in DIAL could be classified into the following categories:
- Pre-interceptors that only modify the incoming request from the client (e.g. rejecting requests following certain criteria)
- Post-interceptors that only modify the response received from the upstream (e.g. censoring the response)
- Generic interceptors that modify both the incoming request and the response from the upstream (e.g. caching the responses)
For example, to implement PII (Personally Identifiable Information) anonymization for all data sent to models through DIAL, you can use a generic interceptor which can employ specific locally deployed NLP models to obfuscate (replace with token) PII in requests (pre-interceptor) and decode it in responses (post-interceptor), effectively ensuring the anonymization of all personal data.
For illustration, the below diagram shows the flow of requests if two interceptors are configured. Every request/response goes through DIAL Core (this is hidden from the diagram for brevity):
Client -> (original request) ->
Interceptor 1 -> (modified request #1) ->
Interceptor 2 -> (modified request #2) ->
Upstream -> (original response) ->
Interceptor 2 -> (modified response #1) ->
Interceptor 1 -> (modified response #2) ->
Client
DIAL Core manages chat completion requests from interceptors through the endpoint: /openai/deployments/interceptor/chat/completions. It uses the reserved deployment name interceptor to handle requests from all interceptors. Upon receiving a request, it identifies the next interceptor based on its per-request API key. The final interceptor in the sequence is always the target deployment (application, model).
Interceptors SDK
You can use DIAL Interceptors Python SDK to create your custom interceptors. Refer to Examples for your reference.
DIAL Core Configuration
Refer to DIAL Core configuration for detailed configuration guidelines and examples.
Interceptors can be defined and assigned in DIAL Core dynamic settings. DIAL administrators can add and assign interceptors via the DIAL Admin Panel.
Step 1: Add
Interceptors that you want to use with your deployments can be defined in the interceptors section in the DIAL Core dynamic settings.
Refer to DIAL Core documentation for a detailed description of configuration parameters.
Example of DIAL Core configuration
"interceptors": {
"gpt-cache": {
"endpoint": "${INTERCEPTOR_SERVICE_URL}/openai/deployments/gpt-cache/chat/completions",
"description": "description"
},
"pii-anonymizer": {
"endpoint": "${INTERCEPTOR_SERVICE_URL}/openai/deployments/pii-anonymizer/chat/completions",
"description": "description"
}
}
...
Step 2: Assign
You can assign declared interceptors to your deployments (applications or models) in the interceptors array in DIAL Core dynamic settings:
Refer to DIAL Core configuration for a complete example.
Example of DIAL Core configuration
{
"models": {
"chat-gpt-4": {
"interceptors": ["gpt-cache", "pii-anonymizer"]
},
...
},
...
}
Flow
To demonstrate the flow, lets take two interceptors gpt-cache and pii-anonymizer configured for the GPT-4 model:
- DIAL Core receives a request from DIAL Chat to query the GPT-4 model.
- The first interceptor, gpt-cache, checks the cache for the request. If found, the response is returned to DIAL Core; if not, the request is forwarded to pii-anonymizer.
- The pii-anonymizer interceptor anonymizes any personally identifiable information (PII) in the request and forwards it to GPT-4.
- After all interceptors have processed the request, DIAL Core sends it directly to the GPT-4 model.
- DIAL Core retrieves the response from GPT-4 and forwards it to pii-anonymizer.
- The pii-anonymizer interceptor restores the original PII in the response and passes it to gpt-cache.
- The gpt-cache interceptor stores the response in the cache and returns it to DIAL Core.
- DIAL Core sends the final response back to DIAL Chat.