Skip to main content
Version: 3.17

ai-rag

Description#

The ai-rag Plugin provides Retrieval-Augmented Generation (RAG) capabilities with LLMs. It facilitates the efficient retrieval of relevant documents or information from external data sources, which are used to enhance the LLM responses, thereby improving the accuracy and contextual relevance of the generated outputs.

The Plugin supports using Azure OpenAI and Azure AI Search services for generating embeddings and performing vector search. PRs for introducing support for other service providers are welcomed.

Plugin Attributes#

NameTypeRequiredDefaultValid valuesDescription
embeddings_providerobjectTrueEmbedding model provider configurations.
embeddings_provider.azure_openaiobjectTrueAzure OpenAI embedding model configurations.
embeddings_provider.azure_openai.endpointstringTrueAzure OpenAI embedding model endpoint.
embeddings_provider.azure_openai.api_keystringTrueAzure OpenAI API key.
vector_search_providerobjectTrueVector search provider configurations.
vector_search_provider.azure_ai_searchobjectTrueConfigurations of Azure AI Search.
vector_search_provider.azure_ai_search.endpointstringTrueAzure AI Search endpoint.
vector_search_provider.azure_ai_search.api_keystringTrueAzure AI Search API key.

Request Body Format#

The following fields must be present in the request body.

FieldTypeDescription
ai_ragobjectRequest body RAG specifications.
ai_rag.embeddingsobjectRequest parameters required to generate embeddings. Contents will depend on the API specification of the configured provider.
ai_rag.vector_searchobjectRequest parameters required to perform vector search. Contents will depend on the API specification of the configured provider.
  • Parameters of ai_rag.embeddings

    • Azure OpenAI
    NameRequiredTypeDescription
    inputTruestringInput text used to compute embeddings, encoded as a string.
    userFalsestringA unique identifier representing your end user, which can help in monitoring and detecting abuse.
    encoding_formatFalsestringThe format to return the embeddings in. Can be either float or base64. Defaults to float.
    dimensionsFalseintegerThe number of dimensions the resulting output embeddings should have. It should match the dimension of your embedding model. For instance, the dimensions for text-embedding-ada-002 are fixed at 1536. For text-embedding-3-small or text-embedding-3-large, dimensions range from 1 to 1536 and 3072, respectively.

    For other parameters please refer to the Azure OpenAI embeddings documentation.

  • Parameters of ai_rag.vector_search

    • Azure AI Search
    FieldRequiredTypeDescription
    fieldsTruestringFields for the vector search.

    For other parameters please refer to the Azure AI Search documentation. In addition, these vector query parameters are also supported.

Example request body:

{
"ai_rag": {
"vector_search": { "fields": "contentVector" },
"embeddings": {
"input": "which service is good for devops",
"dimensions": 1024
}
}
}

Examples#

To follow along the example, create an Azure account and complete the following steps:

Save the API keys and endpoints to environment variables:

# replace with your values

AZ_OPENAI_DOMAIN=https://ai-plugin-developer.openai.azure.com
AZ_OPENAI_API_KEY=9m7VYroxITMDEqKKEnpOknn1rV7QNQT7DrIBApcwMLYJQQJ99ALACYeBjFXJ3w3AAABACOGXGcd
AZ_CHAT_ENDPOINT=${AZ_OPENAI_DOMAIN}/openai/deployments/gpt-4o/chat/completions?api-version=2024-02-15-preview
AZ_EMBEDDING_MODEL=text-embedding-3-large
AZ_EMBEDDINGS_ENDPOINT=${AZ_OPENAI_DOMAIN}/openai/deployments/${AZ_EMBEDDING_MODEL}/embeddings?api-version=2023-05-15

AZ_AI_SEARCH_SVC_DOMAIN=https://ai-plugin-developer.search.windows.net
AZ_AI_SEARCH_KEY=IFZBp3fKVdq7loEVe9LdwMvVdZrad9A4lPH90AzSeC06SlR
AZ_AI_SEARCH_INDEX=vectest
AZ_AI_SEARCH_ENDPOINT=${AZ_AI_SEARCH_SVC_DOMAIN}/indexes/${AZ_AI_SEARCH_INDEX}/docs/search?api-version=2024-07-01
note

You can fetch the admin_key from config.yaml and save to an environment variable with the following command:

admin_key=$(yq '.deployment.admin.admin_key[0].key' conf/config.yaml | sed 's/"//g')

Integrate with Azure for RAG-Enhanced Responses#

The following example demonstrates how you can use the ai-proxy Plugin to proxy requests to Azure OpenAI LLM and use the ai-rag Plugin to generate embeddings and perform vector search to enhance LLM responses.

Send a POST request to the Route with the vector fields name, embedding model dimensions, and an input prompt in the request body:

curl "http://127.0.0.1:9080/rag" -X POST \
-H "Content-Type: application/json" \
-d '{
"ai_rag":{
"vector_search":{
"fields":"contentVector"
},
"embeddings":{
"input":"Which Azure services are good for DevOps?",
"dimensions":1024
}
}
}'

You should receive an HTTP/1.1 200 OK response similar to the following:

{
"choices": [
{
"content_filter_results": {
...
},
"finish_reason": "length",
"index": 0,
"logprobs": null,
"message": {
"content": "Here is a list of Azure services ...",
"role": "assistant"
}
}
],
"created": 1740625850,
"id": "chatcmpl-B54gQdumpfioMPIybFnirr6rq9ZZS",
"model": "gpt-4o-2024-05-13",
"object": "chat.completion",
"prompt_filter_results": [
{
"prompt_index": 0,
"content_filter_results": {
...
}
}
],
"system_fingerprint": "fp_65792305e4",
"usage": {
...
}
}