Version: 3.17

ai-rate-limiting

描述#

ai-rate-limiting 插件对发送到 LLM 服务的请求实施基于令牌的速率限制。它通过控制在指定时间范围内消耗的令牌数量来帮助管理 API 使用，确保公平的资源分配并防止服务过载。它通常与 ai-proxy 或 ai-proxy-multi 插件一起使用。

属性#

名称	类型	必选项	默认值	有效值	描述
limit	integer	False		>0	在给定时间间隔内允许的最大令牌数。`limit` 和 `instances.limit` 中至少应配置一个。如果未配置 `rules`，则为必填项。
time_window	integer	False		>0	与速率限制 `limit` 对应的时间间隔（秒）。`time_window` 和 `instances.time_window` 中至少应配置一个。如果未配置 `rules`，则为必填项。
show_limit_quota_header	boolean	False	true		如果为 true，则在响应中包含速率限制头部。当未设置 `rules` 时，头部为 `X-AI-RateLimit-Limit-`、`X-AI-RateLimit-Remaining-` 和 `X-AI-RateLimit-Reset-`，其中 `` 是实例名称。当设置了 `rules` 时，详见 `rules.header_prefix`。
limit_strategy	string	False	total_tokens	[`total_tokens`, `prompt_tokens`, `completion_tokens`, `expression`]	应用速率限制的令牌类型。`total_tokens` 是 `prompt_tokens` 和 `completion_tokens` 的总和。当设置为 `expression` 时，使用 `cost_expr` 字段动态计算令牌消耗。
cost_expr	string	False			用于动态计算令牌消耗的 Lua 算术表达式。变量从 LLM API 原始使用量响应字段注入。缺失的变量默认为 0。仅在 `limit_strategy` 为 `expression` 时有效。示例：`input_tokens + cache_creation_input_tokens + output_tokens`。
instances	array[object]	False			LLM 实例速率限制配置。
instances.name	string	True			LLM 服务实例的名称。
instances.limit	integer	True		>0	实例在给定时间间隔内允许的最大令牌数。
instances.time_window	integer	True		>0	实例速率限制 `limit` 对应的时间间隔（秒）。
rejected_code	integer	False	503	[200, 599]	当超出配额的请求被拒绝时返回的 HTTP 状态码。
rejected_msg	string	False			当超出配额的请求被拒绝时返回的响应体。
rules	array[object]	False			按顺序应用的速率限制规则数组。如果配置了此项，则优先于 `limit` 和 `time_window`。
rules.count	integer 或 string	True		>0 或变量表达式	在给定时间间隔内允许的最大令牌数。可以是静态整数或变量表达式，如 `$http_custom_limit`。
rules.time_window	integer 或 string	True		>0 或变量表达式	与速率限制 `count` 对应的时间间隔（秒）。可以是静态整数或变量表达式。
rules.key	string	True			用于计数请求的键。如果配置的键不存在，则不会执行该规则。`key` 被解释为变量组合。所有变量应以美元符号（`$`）为前缀。例如：`$http_custom_a $http_custom_b`。
rules.header_prefix	string	False			速率限制响应头部的前缀。配置后，前缀插入到头部名称中 `X-AI-` 之后。例如，将 `header_prefix` 设置为 `test` 时，头部变为 `X-AI-Test-RateLimit-Limit`、`X-AI-Test-RateLimit-Remaining` 和 `X-AI-Test-RateLimit-Reset`。未配置时，使用规则在数组中的索引作为前缀。例如，第一条规则的头部为 `X-AI-1-RateLimit-Limit`、`X-AI-1-RateLimit-Remaining` 和 `X-AI-1-RateLimit-Reset`。

示例#

以下示例演示了如何为不同场景配置 ai-rate-limiting。

note

你可以使用以下命令从 config.yaml 获取 admin_key 并保存到环境变量中：

admin_key=$(yq '.deployment.admin.admin_key[0].key' conf/config.yaml | sed 's/"//g')

与 `ai-proxy` 一起应用速率限制#

以下示例演示了如何使用 ai-proxy 代理 LLM 流量，并使用 ai-rate-limiting 在实例上配置基于令牌的速率限制。

创建一个路由并更新你的 LLM 提供商、模型、API 密钥和端点（如适用）：

Admin API
ADC
Ingress Controller

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "ai-rate-limiting-route",
    "uri": "/anything",
    "methods": ["POST"],
    "plugins": {
      "ai-proxy": {
        "provider": "openai",
        "auth": {
          "header": {
            "Authorization": "Bearer '"$OPENAI_API_KEY"'"
          }
        },
        "options": {
          "model": "gpt-35-turbo-instruct",
          "max_tokens": 512,
          "temperature": 1.0
        }
      },
      "ai-rate-limiting": {
        "limit": 300,
        "time_window": 30,
        "limit_strategy": "prompt_tokens"
      }
    }
  }'

adc.yaml

services:
  - name: ai-rate-limiting-service
    routes:
      - name: ai-rate-limiting-route
        uris:
          - /anything
        methods:
          - POST
        plugins:
          ai-proxy:
            provider: openai
            auth:
              header:
                Authorization: "Bearer ${OPENAI_API_KEY}"
            options:
              model: gpt-35-turbo-instruct
              max_tokens: 512
              temperature: 1.0
          ai-rate-limiting:
            limit: 300
            time_window: 30
            limit_strategy: prompt_tokens

将配置同步到网关：

adc sync -f adc.yaml

Gateway API
APISIX Ingress Controller

ai-rate-limiting-ic.yaml

apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
  namespace: aic
  name: ai-rate-limiting-plugin-config
spec:
  plugins:
    - name: ai-proxy
      config:
        provider: openai
        auth:
          header:
            Authorization: "Bearer your-api-key"
        options:
          model: gpt-35-turbo-instruct
          max_tokens: 512
          temperature: 1.0
    - name: ai-rate-limiting
      config:
        limit: 300
        time_window: 30
        limit_strategy: prompt_tokens
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  namespace: aic
  name: ai-rate-limiting-route
spec:
  parentRefs:
    - name: apisix
  rules:
    - matches:
        - path:
            type: Exact
            value: /anything
          method: POST
      filters:
        - type: ExtensionRef
          extensionRef:
            group: apisix.apache.org
            kind: PluginConfig
            name: ai-rate-limiting-plugin-config

ai-rate-limiting-ic.yaml

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  namespace: aic
  name: ai-rate-limiting-route
spec:
  ingressClassName: apisix
  http:
    - name: ai-rate-limiting-route
      match:
        paths:
          - /anything
        methods:
          - POST
      plugins:
        - name: ai-proxy
          enable: true
          config:
            provider: openai
            auth:
              header:
                Authorization: "Bearer your-api-key"
            options:
              model: gpt-35-turbo-instruct
              max_tokens: 512
              temperature: 1.0
        - name: ai-rate-limiting
          enable: true
          config:
            limit: 300
            time_window: 30
            limit_strategy: prompt_tokens

将配置应用到集群：

kubectl apply -f ai-rate-limiting-ic.yaml

向路由发送 POST 请求，在请求体中包含系统提示和示例用户问题：

curl "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

你应该收到类似以下的响应：

{
  ...
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1 + 1 equals 2. This is a fundamental arithmetic operation where adding one unit to another results in a total of two units."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  ...
}

如果在 30 秒窗口内消耗了 300 个提示令牌的速率限制配额，所有额外的请求将被拒绝。

对多个实例中的一个进行速率限制#

以下示例演示了如何使用 ai-proxy-multi 配置两个模型进行负载均衡，将 80% 的流量转发到一个实例，20% 转发到另一个实例。此外，使用 ai-rate-limiting 对接收 80% 流量的实例配置基于令牌的速率限制，这样当配置的配额完全消耗时，额外的流量将被转发到另一个实例。

创建一个路由，对 deepseek-instance-1 实例应用 30 秒窗口内 100 个总令牌的速率限制配额，并更新你的 LLM 提供商、模型、API 密钥和端点（如适用）：

Admin API
ADC
Ingress Controller

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "ai-rate-limiting-route",
    "uri": "/anything",
    "methods": ["POST"],
    "plugins": {
      "ai-proxy-multi": {
        "instances": [
          {
            "name": "deepseek-instance-1",
            "provider": "deepseek",
            "weight": 8,
            "auth": {
              "header": {
                "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
              }
            },
            "options": {
              "model": "deepseek-chat"
            }
          },
          {
            "name": "deepseek-instance-2",
            "provider": "deepseek",
            "weight": 2,
            "auth": {
              "header": {
                "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
              }
            },
            "options": {
              "model": "deepseek-chat"
            }
          }
        ]
      },
      "ai-rate-limiting": {
        "limit_strategy": "total_tokens",
        "instances": [
          {
            "name": "deepseek-instance-1",
            "limit": 100,
            "time_window": 30
          }
        ]
      }
    }
  }'

adc.yaml

services:
  - name: ai-rate-limiting-service
    routes:
      - name: ai-rate-limiting-route
        uris:
          - /anything
        methods:
          - POST
        plugins:
          ai-proxy-multi:
            instances:
              - name: deepseek-instance-1
                provider: deepseek
                weight: 8
                auth:
                  header:
                    Authorization: "Bearer ${DEEPSEEK_API_KEY}"
                options:
                  model: deepseek-chat
              - name: deepseek-instance-2
                provider: deepseek
                weight: 2
                auth:
                  header:
                    Authorization: "Bearer ${DEEPSEEK_API_KEY}"
                options:
                  model: deepseek-chat
          ai-rate-limiting:
            limit_strategy: total_tokens
            instances:
              - name: deepseek-instance-1
                limit: 100
                time_window: 30

将配置同步到网关：

adc sync -f adc.yaml

Gateway API
APISIX Ingress Controller

ai-rate-limiting-ic.yaml

apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
  namespace: aic
  name: ai-rate-limiting-plugin-config
spec:
  plugins:
    - name: ai-proxy-multi
      config:
        instances:
          - name: deepseek-instance-1
            provider: deepseek
            weight: 8
            auth:
              header:
                Authorization: "Bearer your-api-key"
            options:
              model: deepseek-chat
          - name: deepseek-instance-2
            provider: deepseek
            weight: 2
            auth:
              header:
                Authorization: "Bearer your-api-key"
            options:
              model: deepseek-chat
    - name: ai-rate-limiting
      config:
        limit_strategy: total_tokens
        instances:
          - name: deepseek-instance-1
            limit: 100
            time_window: 30
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  namespace: aic
  name: ai-rate-limiting-route
spec:
  parentRefs:
    - name: apisix
  rules:
    - matches:
        - path:
            type: Exact
            value: /anything
          method: POST
      filters:
        - type: ExtensionRef
          extensionRef:
            group: apisix.apache.org
            kind: PluginConfig
            name: ai-rate-limiting-plugin-config

ai-rate-limiting-ic.yaml

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  namespace: aic
  name: ai-rate-limiting-route
spec:
  ingressClassName: apisix
  http:
    - name: ai-rate-limiting-route
      match:
        paths:
          - /anything
        methods:
          - POST
      plugins:
        - name: ai-proxy-multi
          enable: true
          config:
            instances:
              - name: deepseek-instance-1
                provider: deepseek
                weight: 8
                auth:
                  header:
                    Authorization: "Bearer your-api-key"
                options:
                  model: deepseek-chat
              - name: deepseek-instance-2
                provider: deepseek
                weight: 2
                auth:
                  header:
                    Authorization: "Bearer your-api-key"
                options:
                  model: deepseek-chat
        - name: ai-rate-limiting
          enable: true
          config:
            limit_strategy: total_tokens
            instances:
              - name: deepseek-instance-1
                limit: 100
                time_window: 30

将配置应用到集群：

kubectl apply -f ai-rate-limiting-ic.yaml

向路由发送 POST 请求，在请求体中包含系统提示和示例用户问题：

curl "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

你应该收到类似以下的响应：

{
  ...
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1 + 1 equals 2. This is a fundamental arithmetic operation where adding one unit to another results in a total of two units."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  ...
}

如果 deepseek-instance-1 实例在 30 秒窗口内消耗了 100 个令牌的速率限制配额，额外的请求将全部转发到未设置速率限制的 deepseek-instance-2。

对所有实例应用相同配额#

以下示例演示了如何对 ai-rate-limiting 中的所有 LLM 上游实例应用相同的速率限制配额。

为了演示和更容易区分，你将配置一个 OpenAI 实例和一个 DeepSeek 实例作为上游 LLM 服务。

创建一个路由，对所有实例在 60 秒窗口内应用 100 个总令牌的速率限制配额，并更新你的 LLM 提供商、模型、API 密钥和端点（如适用）：

Admin API
ADC
Ingress Controller

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "ai-rate-limiting-route",
    "uri": "/anything",
    "methods": ["POST"],
    "plugins": {
      "ai-proxy-multi": {
        "instances": [
          {
            "name": "openai-instance",
            "provider": "openai",
            "weight": 0,
            "auth": {
              "header": {
                "Authorization": "Bearer '"$OPENAI_API_KEY"'"
              }
            },
            "options": {
              "model": "gpt-4"
            }
          },
          {
            "name": "deepseek-instance",
            "provider": "deepseek",
            "weight": 0,
            "auth": {
              "header": {
                "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
              }
            },
            "options": {
              "model": "deepseek-chat"
            }
          }
        ]
      },
      "ai-rate-limiting": {
        "limit": 100,
        "time_window": 60,
        "rejected_code": 429,
        "limit_strategy": "total_tokens"
      }
    }
  }'

adc.yaml

services:
  - name: ai-rate-limiting-service
    routes:
      - name: ai-rate-limiting-route
        uris:
          - /anything
        methods:
          - POST
        plugins:
          ai-proxy-multi:
            instances:
              - name: openai-instance
                provider: openai
                weight: 0
                auth:
                  header:
                    Authorization: "Bearer ${OPENAI_API_KEY}"
                options:
                  model: gpt-4
              - name: deepseek-instance
                provider: deepseek
                weight: 0
                auth:
                  header:
                    Authorization: "Bearer ${DEEPSEEK_API_KEY}"
                options:
                  model: deepseek-chat
          ai-rate-limiting:
            limit: 100
            time_window: 60
            rejected_code: 429
            limit_strategy: total_tokens

将配置同步到网关：

adc sync -f adc.yaml

Gateway API
APISIX Ingress Controller

ai-rate-limiting-ic.yaml

apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
  namespace: aic
  name: ai-rate-limiting-plugin-config
spec:
  plugins:
    - name: ai-proxy-multi
      config:
        instances:
          - name: openai-instance
            provider: openai
            weight: 0
            auth:
              header:
                Authorization: "Bearer your-api-key"
            options:
              model: gpt-4
          - name: deepseek-instance
            provider: deepseek
            weight: 0
            auth:
              header:
                Authorization: "Bearer your-api-key"
            options:
              model: deepseek-chat
    - name: ai-rate-limiting
      config:
        limit: 100
        time_window: 60
        rejected_code: 429
        limit_strategy: total_tokens
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  namespace: aic
  name: ai-rate-limiting-route
spec:
  parentRefs:
    - name: apisix
  rules:
    - matches:
        - path:
            type: Exact
            value: /anything
          method: POST
      filters:
        - type: ExtensionRef
          extensionRef:
            group: apisix.apache.org
            kind: PluginConfig
            name: ai-rate-limiting-plugin-config

ai-rate-limiting-ic.yaml

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  namespace: aic
  name: ai-rate-limiting-route
spec:
  ingressClassName: apisix
  http:
    - name: ai-rate-limiting-route
      match:
        paths:
          - /anything
        methods:
          - POST
      plugins:
        - name: ai-proxy-multi
          enable: true
          config:
            instances:
              - name: openai-instance
                provider: openai
                weight: 0
                auth:
                  header:
                    Authorization: "Bearer your-api-key"
                options:
                  model: gpt-4
              - name: deepseek-instance
                provider: deepseek
                weight: 0
                auth:
                  header:
                    Authorization: "Bearer your-api-key"
                options:
                  model: deepseek-chat
        - name: ai-rate-limiting
          enable: true
          config:
            limit: 100
            time_window: 60
            rejected_code: 429
            limit_strategy: total_tokens

将配置应用到集群：

kubectl apply -f ai-rate-limiting-ic.yaml

向路由发送 POST 请求，在请求体中包含系统提示和示例用户问题：

curl -i "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "Explain Newtons laws" }
    ]
  }'

你应该收到来自任一 LLM 实例的响应，类似以下内容：

{
  ...,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Sure! Sir Isaac Newton formulated three laws of motion that describe the motion of objects. These laws are widely used in physics and engineering for studying and understanding how things move. Here they are:\n\n1. Newton's First Law - Law of Inertia: An object at rest tends to stay at rest and an object in motion tends to stay in motion with the same speed and in the same direction unless acted upon by an unbalanced force. This is also known as the principle of inertia.\n\n2. Newton's Second Law of Motion - Force and Acceleration: The acceleration of an object is directly proportional to the net force acting on it and inversely proportional to its mass. This is usually formulated as F=ma where F is the force applied, m is the mass of the object and a is the acceleration produced.\n\n3. Newton's Third Law - Action and Reaction: For every action, there is an equal and opposite reaction. This means that any force exerted on a body will create a force of equal magnitude but in the opposite direction on the object that exerted the first force.\n\nIn simple terms: \n1. If you slide a book on a table and let go, it will stop because of the friction (or force) between it and the table.\n2.",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 23,
    "completion_tokens": 256,
    "total_tokens": 279,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "service_tier": "default",
  "system_fingerprint": null
}

由于 total_tokens 值超过了配置的 100 配额，预期在 60 秒窗口内的下一个请求将被转发到另一个实例。

在同一个 60 秒窗口内，向路由发送另一个 POST 请求：

curl -i "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "Explain Newtons laws" }
    ]
  }'

你应该收到来自另一个 LLM 实例的响应，类似以下内容：

{
  ...
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Sure! Newton's laws of motion are three fundamental principles that describe the relationship between the motion of an object and the forces acting on it. They were formulated by Sir Isaac Newton in the late 17th century and are foundational to classical mechanics. Here's an explanation of each law:\n\n---\n\n### **1. Newton's First Law (Law of Inertia)**\n- **Statement**: An object will remain at rest or in uniform motion in a straight line unless acted upon by an external force.\n- **What it means**: This law introduces the concept of **inertia**, which is the tendency of an object to resist changes in its state of motion. If no net force acts on an object, its velocity (speed and direction) will not change.\n- **Example**: A book lying on a table will stay at rest unless you push it. Similarly, a hockey puck sliding on ice will keep moving at a constant speed unless friction or another force slows it down.\n\n---\n\n### **2. Newton's Second Law (Law of Acceleration)**\n- **Statement**: The acceleration of an object is directly proportional to the net force acting on it and inversely proportional to its mass. Mathematically, this is expressed as:\n  \\[\n  F = ma\n  \\]\n"
      },
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 256,
    "total_tokens": 269,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "prompt_cache_hit_tokens": 0,
    "prompt_cache_miss_tokens": 13
  },
  "system_fingerprint": "fp_3a5770e1b4_prod0225"
}

由于 total_tokens 值超过了配置的 100 配额，预期在 60 秒窗口内的下一个请求将被拒绝。

在同一个 60 秒窗口内，向路由发送第三个 POST 请求：

curl -i "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "Explain Newtons laws" }
    ]
  }'

你应该收到 HTTP 429 Too Many Requests 响应并观察到以下头部：

X-AI-RateLimit-Limit-openai-instance: 100
X-AI-RateLimit-Remaining-openai-instance: 0
X-AI-RateLimit-Reset-openai-instance: 0
X-AI-RateLimit-Limit-deepseek-instance: 100
X-AI-RateLimit-Remaining-deepseek-instance: 0
X-AI-RateLimit-Reset-deepseek-instance: 0

配置实例优先级和速率限制#

以下示例演示了如何配置两个具有不同优先级的模型，并对具有较高优先级的实例应用速率限制。在 fallback_strategy 设置为 ["rate_limiting"] 的情况下，一旦高优先级实例的速率限制配额完全消耗，插件应继续将请求转发到低优先级实例。

创建一个路由，对 openai-instance 实例设置速率限制和更高的优先级，并将 fallback_strategy 设置为 ["rate_limiting"]。更新你的 LLM 提供商、模型、API 密钥和端点（如适用）：

Admin API
ADC
Ingress Controller

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "ai-rate-limiting-route",
    "uri": "/anything",
    "methods": ["POST"],
    "plugins": {
      "ai-proxy-multi": {
        "fallback_strategy": ["rate_limiting"],
        "instances": [
          {
            "name": "openai-instance",
            "provider": "openai",
            "priority": 1,
            "weight": 0,
            "auth": {
              "header": {
                "Authorization": "Bearer '"$OPENAI_API_KEY"'"
              }
            },
            "options": {
              "model": "gpt-4"
            }
          },
          {
            "name": "deepseek-instance",
            "provider": "deepseek",
            "priority": 0,
            "weight": 0,
            "auth": {
              "header": {
                "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
              }
            },
            "options": {
              "model": "deepseek-chat"
            }
          }
        ]
      },
      "ai-rate-limiting": {
        "instances": [
          {
            "name": "openai-instance",
            "limit": 10,
            "time_window": 60
          }
        ],
        "limit_strategy": "total_tokens"
      }
    }
  }'

adc.yaml

services:
  - name: ai-rate-limiting-service
    routes:
      - name: ai-rate-limiting-route
        uris:
          - /anything
        methods:
          - POST
        plugins:
          ai-proxy-multi:
            fallback_strategy:
              - rate_limiting
            instances:
              - name: openai-instance
                provider: openai
                priority: 1
                weight: 0
                auth:
                  header:
                    Authorization: "Bearer ${OPENAI_API_KEY}"
                options:
                  model: gpt-4
              - name: deepseek-instance
                provider: deepseek
                priority: 0
                weight: 0
                auth:
                  header:
                    Authorization: "Bearer ${DEEPSEEK_API_KEY}"
                options:
                  model: deepseek-chat
          ai-rate-limiting:
            instances:
              - name: openai-instance
                limit: 10
                time_window: 60
            limit_strategy: total_tokens

将配置同步到网关：

adc sync -f adc.yaml

Gateway API
APISIX Ingress Controller

ai-rate-limiting-ic.yaml

apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
  namespace: aic
  name: ai-rate-limiting-plugin-config
spec:
  plugins:
    - name: ai-proxy-multi
      config:
        fallback_strategy:
          - rate_limiting
        instances:
          - name: openai-instance
            provider: openai
            priority: 1
            weight: 0
            auth:
              header:
                Authorization: "Bearer your-api-key"
            options:
              model: gpt-4
          - name: deepseek-instance
            provider: deepseek
            priority: 0
            weight: 0
            auth:
              header:
                Authorization: "Bearer your-api-key"
            options:
              model: deepseek-chat
    - name: ai-rate-limiting
      config:
        instances:
          - name: openai-instance
            limit: 10
            time_window: 60
        limit_strategy: total_tokens
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  namespace: aic
  name: ai-rate-limiting-route
spec:
  parentRefs:
    - name: apisix
  rules:
    - matches:
        - path:
            type: Exact
            value: /anything
          method: POST
      filters:
        - type: ExtensionRef
          extensionRef:
            group: apisix.apache.org
            kind: PluginConfig
            name: ai-rate-limiting-plugin-config

ai-rate-limiting-ic.yaml

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  namespace: aic
  name: ai-rate-limiting-route
spec:
  ingressClassName: apisix
  http:
    - name: ai-rate-limiting-route
      match:
        paths:
          - /anything
        methods:
          - POST
      plugins:
        - name: ai-proxy-multi
          enable: true
          config:
            fallback_strategy:
              - rate_limiting
            instances:
              - name: openai-instance
                provider: openai
                priority: 1
                weight: 0
                auth:
                  header:
                    Authorization: "Bearer your-api-key"
                options:
                  model: gpt-4
              - name: deepseek-instance
                provider: deepseek
                priority: 0
                weight: 0
                auth:
                  header:
                    Authorization: "Bearer your-api-key"
                options:
                  model: deepseek-chat
        - name: ai-rate-limiting
          enable: true
          config:
            instances:
              - name: openai-instance
                limit: 10
                time_window: 60
            limit_strategy: total_tokens

将配置应用到集群：

kubectl apply -f ai-rate-limiting-ic.yaml

向路由发送 POST 请求，在请求体中包含系统提示和示例用户问题：

curl "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

你应该收到类似以下的响应：

{
  ...,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1+1 equals 2.",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 23,
    "completion_tokens": 8,
    "total_tokens": 31,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "service_tier": "default",
  "system_fingerprint": null
}

由于 total_tokens 值超过了配置的 10 配额，预期在 60 秒窗口内的下一个请求将被转发到另一个实例。

在同一个 60 秒窗口内，向路由发送另一个 POST 请求：

curl "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "Explain Newton law" }
    ]
  }'

你应该看到类似以下的响应：

{
  ...,
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Certainly! Newton's laws of motion are three fundamental principles that describe the relationship between the motion of an object and the forces acting on it. They were formulated by Sir Isaac Newton in the late 17th century and are foundational to classical mechanics.\n\n---\n\n### **1. Newton's First Law (Law of Inertia):**\n- **Statement:** An object at rest will remain at rest, and an object in motion will continue moving at a constant velocity (in a straight line at a constant speed), unless acted upon by an external force.\n- **Key Idea:** This law introduces the concept of **inertia**, which is the tendency of an object to resist changes in its state of motion.\n- **Example:** If you slide a book across a table, it eventually stops because of the force of friction acting on it. Without friction, the book would keep moving indefinitely.\n\n---\n\n### **2. Newton's Second Law (Law of Acceleration):**\n- **Statement:** The acceleration of an object is directly proportional to the net force acting on it and inversely proportional to its mass. Mathematically, this is expressed as:\n  \\[\n  F = ma\n  \\]\n  where:\n  - \\( F \\) = net force applied (in Newtons),\n  -"
      },
      ...
    }
  ],
  ...
}

按消费者进行负载均衡和速率限制#

以下示例演示了如何配置两个模型进行负载均衡，并按消费者应用速率限制。

创建消费者 johndoe 并对 openai-instance 实例设置 60 秒窗口内 10 个令牌的速率限制配额：

Admin API
ADC
Ingress Controller

curl "http://127.0.0.1:9180/apisix/admin/consumers" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "username": "johndoe",
    "plugins": {
      "ai-rate-limiting": {
        "instances": [
          {
            "name": "openai-instance",
            "limit": 10,
            "time_window": 60
          }
        ],
        "rejected_code": 429,
        "limit_strategy": "total_tokens"
      }
    }
  }'

为 johndoe 配置 key-auth Credential：

curl "http://127.0.0.1:9180/apisix/admin/consumers/johndoe/credentials" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "cred-john-key-auth",
    "plugins": {
      "key-auth": {
        "key": "john-key"
      }
    }
  }'

创建另一个消费者 janedoe 并对 deepseek-instance 实例设置 60 秒窗口内 10 个令牌的速率限制配额：

curl "http://127.0.0.1:9180/apisix/admin/consumers" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "username": "janedoe",
    "plugins": {
      "ai-rate-limiting": {
        "instances": [
          {
            "name": "deepseek-instance",
            "limit": 10,
            "time_window": 60
          }
        ],
        "rejected_code": 429,
        "limit_strategy": "total_tokens"
      }
    }
  }'

为 janedoe 配置 key-auth Credential：

curl "http://127.0.0.1:9180/apisix/admin/consumers/janedoe/credentials" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "cred-jane-key-auth",
    "plugins": {
      "key-auth": {
        "key": "jane-key"
      }
    }
  }'

创建一个路由并更新你的 LLM 提供商、模型、API 密钥和端点（如适用）：

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "ai-rate-limiting-route",
    "uri": "/anything",
    "methods": ["POST"],
    "plugins": {
      "key-auth": {},
      "ai-proxy-multi": {
        "fallback_strategy": ["rate_limiting"],
        "instances": [
          {
            "name": "openai-instance",
            "provider": "openai",
            "weight": 0,
            "auth": {
              "header": {
                "Authorization": "Bearer '"$OPENAI_API_KEY"'"
              }
            },
            "options": {
              "model": "gpt-4"
            }
          },
          {
            "name": "deepseek-instance",
            "provider": "deepseek",
            "weight": 0,
            "auth": {
              "header": {
                "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
              }
            },
            "options": {
              "model": "deepseek-chat"
            }
          }
        ]
      }
    }
  }'

创建两个消费者和一个启用按消费者速率限制的路由：

adc.yaml

consumers:
  - username: johndoe
    plugins:
      ai-rate-limiting:
        instances:
          - name: openai-instance
            limit: 10
            time_window: 60
        rejected_code: 429
        limit_strategy: total_tokens
    credentials:
      - name: key-auth
        type: key-auth
        config:
          key: john-key
  - username: janedoe
    plugins:
      ai-rate-limiting:
        instances:
          - name: deepseek-instance
            limit: 10
            time_window: 60
        rejected_code: 429
        limit_strategy: total_tokens
    credentials:
      - name: key-auth
        type: key-auth
        config:
          key: jane-key
services:
  - name: ai-rate-limiting-service
    routes:
      - name: ai-rate-limiting-route
        uris:
          - /anything
        methods:
          - POST
        plugins:
          key-auth: {}
          ai-proxy-multi:
            fallback_strategy:
              - rate_limiting
            instances:
              - name: openai-instance
                provider: openai
                weight: 0
                auth:
                  header:
                    Authorization: "Bearer ${OPENAI_API_KEY}"
                options:
                  model: gpt-4
              - name: deepseek-instance
                provider: deepseek
                weight: 0
                auth:
                  header:
                    Authorization: "Bearer ${DEEPSEEK_API_KEY}"
                options:
                  model: deepseek-chat

将配置同步到网关：

adc sync -f adc.yaml

Gateway API
APISIX Ingress Controller

创建两个消费者和一个启用按消费者速率限制的路由：

ai-rate-limiting-ic.yaml

apiVersion: apisix.apache.org/v1alpha1
kind: Consumer
metadata:
  namespace: aic
  name: johndoe
spec:
  gatewayRef:
    name: apisix
  credentials:
    - type: key-auth
      name: primary-key
      config:
        key: john-key
  plugins:
    - name: ai-rate-limiting
      config:
        instances:
          - name: openai-instance
            limit: 10
            time_window: 60
        rejected_code: 429
        limit_strategy: total_tokens
---
apiVersion: apisix.apache.org/v1alpha1
kind: Consumer
metadata:
  namespace: aic
  name: janedoe
spec:
  gatewayRef:
    name: apisix
  credentials:
    - type: key-auth
      name: primary-key
      config:
        key: jane-key
  plugins:
    - name: ai-rate-limiting
      config:
        instances:
          - name: deepseek-instance
            limit: 10
            time_window: 60
        rejected_code: 429
        limit_strategy: total_tokens
---
apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
  namespace: aic
  name: ai-rate-limiting-plugin-config
spec:
  plugins:
    - name: key-auth
      config:
        _meta:
          disable: false
    - name: ai-proxy-multi
      config:
        fallback_strategy:
          - rate_limiting
        instances:
          - name: openai-instance
            provider: openai
            weight: 0
            auth:
              header:
                Authorization: "Bearer your-api-key"
            options:
              model: gpt-4
          - name: deepseek-instance
            provider: deepseek
            weight: 0
            auth:
              header:
                Authorization: "Bearer your-api-key"
            options:
              model: deepseek-chat
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  namespace: aic
  name: ai-rate-limiting-route
spec:
  parentRefs:
    - name: apisix
  rules:
    - matches:
        - path:
            type: Exact
            value: /anything
          method: POST
      filters:
        - type: ExtensionRef
          extensionRef:
            group: apisix.apache.org
            kind: PluginConfig
            name: ai-rate-limiting-plugin-config

note

ApisixConsumer CRD 目前不支持在消费者上配置除 authParameter 中允许的认证插件之外的其他插件。此示例无法使用 APISIX CRD 完成。

将配置应用到集群：

kubectl apply -f ai-rate-limiting-ic.yaml

向路由发送不带任何消费者密钥的 POST 请求：

curl -i "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

你应该收到 HTTP/1.1 401 Unauthorized 响应。

使用 johndoe 的密钥向路由发送 POST 请求：

curl "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -H 'apikey: john-key' \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

你应该收到类似以下的响应：

{
  ...,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1+1 equals 2.",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 23,
    "completion_tokens": 8,
    "total_tokens": 31,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "service_tier": "default",
  "system_fingerprint": null
}

由于 total_tokens 值超过了 johndoe 的 openai 实例配置配额，预期在 60 秒窗口内来自 johndoe 的下一个请求将被转发到 deepseek 实例。

在同一个 60 秒窗口内，使用 johndoe 的密钥向路由发送另一个 POST 请求：

curl "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -H 'apikey: john-key' \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "Explain Newtons laws to me" }
    ]
  }'

你应该看到类似以下的响应：

{
  ...,
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Certainly! Newton's laws of motion are three fundamental principles that describe the relationship between the motion of an object and the forces acting on it. They were formulated by Sir Isaac Newton in the late 17th century and are foundational to classical mechanics.\n\n---\n\n### **1. Newton's First Law (Law of Inertia):**\n- **Statement:** An object at rest will remain at rest, and an object in motion will continue moving at a constant velocity (in a straight line at a constant speed), unless acted upon by an external force.\n- **Key Idea:** This law introduces the concept of **inertia**, which is the tendency of an object to resist changes in its state of motion.\n- **Example:** If you slide a book across a table, it eventually stops because of the force of friction acting on it. Without friction, the book would keep moving indefinitely.\n\n---\n\n### **2. Newton's Second Law (Law of Acceleration):**\n- **Statement:** The acceleration of an object is directly proportional to the net force acting on it and inversely proportional to its mass. Mathematically, this is expressed as:\n  \\[\n  F = ma\n  \\]\n  where:\n  - \\( F \\) = net force applied (in Newtons),\n  -"
      },
      ...
    }
  ],
  ...
}

使用 janedoe 的密钥向路由发送 POST 请求：

curl "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -H 'apikey: jane-key' \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

你应该收到类似以下的响应：

{
  ...,
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The sum of 1 and 1 is 2. This is a basic arithmetic operation where you combine two units to get a total of two units."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 31,
    "total_tokens": 45,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "prompt_cache_hit_tokens": 0,
    "prompt_cache_miss_tokens": 14
  },
  "system_fingerprint": "fp_3a5770e1b4_prod0225"
}

由于 total_tokens 值超过了 janedoe 的 deepseek 实例配置配额，预期在 60 秒窗口内来自 janedoe 的下一个请求将被转发到 openai 实例。

在同一个 60 秒窗口内，使用 janedoe 的密钥向路由发送另一个 POST 请求：

curl "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -H 'apikey: jane-key' \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "Explain Newtons laws to me" }
    ]
  }'

你应该看到类似以下的响应：

{
  ...,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Sure, here are Newton's three laws of motion:\n\n1) Newton's First Law, also known as the Law of Inertia, states that an object at rest will stay at rest, and an object in motion will stay in motion, unless acted on by an external force. In simple words, this law suggests that an object will keep doing whatever it is doing until something causes it to do otherwise. \n\n2) Newton's Second Law states that the force acting on an object is equal to the mass of that object times its acceleration (F=ma). This means that force is directly proportional to mass and acceleration. The heavier the object and the faster it accelerates, the greater the force.\n\n3) Newton's Third Law, also known as the law of action and reaction, states that for every action, there is an equal and opposite reaction. Essentially, any force exerted onto a body will create a force of equal magnitude but in the opposite direction on the object that exerted the first force.\n\nRemember, these laws become less accurate when considering speeds near the speed of light (where Einstein's theory of relativity becomes more appropriate) or objects very small or very large. However, for everyday situations, they provide a good model of how things move.",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  ...
}

这显示了 ai-proxy-multi 根据消费者在 ai-rate-limiting 中的速率限制规则对流量进行负载均衡。

按规则进行速率限制#

以下示例演示了如何配置插件根据请求属性应用不同的速率限制规则。在此示例中，速率限制基于表示调用者访问层级的 HTTP 头部值进行应用。所有规则按顺序执行。如果配置的键不存在，则跳过相应的规则。

创建一个带有 ai-rate-limiting 插件的路由，根据请求头部应用不同的速率限制，允许按订阅（X-Subscription-ID）进行速率限制，并对试用用户（X-Trial-ID）实施更严格的限制：

Admin API
ADC
Ingress Controller

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "ai-rate-limiting-route",
    "uri": "/anything",
    "methods": ["POST"],
    "plugins": {
      "ai-proxy-multi": {
        "fallback_strategy": ["rate_limiting"],
        "instances": [
          {
            "name": "openai-instance",
            "provider": "openai",
            "priority": 1,
            "weight": 0,
            "auth": {
              "header": {
                "Authorization": "Bearer '"$OPENAI_API_KEY"'"
              }
            },
            "options": {
              "model": "gpt-4"
            }
          },
          {
            "name": "deepseek-instance",
            "provider": "deepseek",
            "priority": 0,
            "weight": 0,
            "auth": {
              "header": {
                "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
              }
            },
            "options": {
              "model": "deepseek-chat"
            }
          }
        ]
      },
      "ai-rate-limiting": {
        "rejected_code": 429,
        "rules": [
          {
            "key": "${http_x_subscription_id}",
            "count": "${http_x_custom_count ?? 500}",
            "time_window": 60
          },
          {
            "key": "${http_x_trial_id}",
            "count": 50,
            "time_window": 60
          }
        ]
      }
    }
  }'

adc.yaml

services:
  - name: ai-rate-limiting-service
    routes:
      - name: ai-rate-limiting-route
        uris:
          - /anything
        methods:
          - POST
        plugins:
          ai-proxy-multi:
            fallback_strategy:
              - rate_limiting
            instances:
              - name: openai-instance
                provider: openai
                priority: 1
                weight: 0
                auth:
                  header:
                    Authorization: "Bearer ${OPENAI_API_KEY}"
                options:
                  model: gpt-4
              - name: deepseek-instance
                provider: deepseek
                priority: 0
                weight: 0
                auth:
                  header:
                    Authorization: "Bearer ${DEEPSEEK_API_KEY}"
                options:
                  model: deepseek-chat
          ai-rate-limiting:
            rejected_code: 429
            rules:
              - key: "${http_x_subscription_id}"
                count: "${http_x_custom_count ?? 500}"
                time_window: 60
              - key: "${http_x_trial_id}"
                count: 50
                time_window: 60

将配置同步到网关：

adc sync -f adc.yaml

Gateway API
APISIX Ingress Controller

ai-rate-limiting-ic.yaml

apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
  namespace: aic
  name: ai-rate-limiting-plugin-config
spec:
  plugins:
    - name: ai-proxy-multi
      config:
        fallback_strategy:
          - rate_limiting
        instances:
          - name: openai-instance
            provider: openai
            priority: 1
            weight: 0
            auth:
              header:
                Authorization: "Bearer your-api-key"
            options:
              model: gpt-4
          - name: deepseek-instance
            provider: deepseek
            priority: 0
            weight: 0
            auth:
              header:
                Authorization: "Bearer your-api-key"
            options:
              model: deepseek-chat
    - name: ai-rate-limiting
      config:
        rejected_code: 429
        rules:
          - key: "${http_x_subscription_id}"
            count: "${http_x_custom_count ?? 500}"
            time_window: 60
          - key: "${http_x_trial_id}"
            count: 50
            time_window: 60
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  namespace: aic
  name: ai-rate-limiting-route
spec:
  parentRefs:
    - name: apisix
  rules:
    - matches:
        - path:
            type: Exact
            value: /anything
          method: POST
      filters:
        - type: ExtensionRef
          extensionRef:
            group: apisix.apache.org
            kind: PluginConfig
            name: ai-rate-limiting-plugin-config

ai-rate-limiting-ic.yaml

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  namespace: aic
  name: ai-rate-limiting-route
spec:
  ingressClassName: apisix
  http:
    - name: ai-rate-limiting-route
      match:
        paths:
          - /anything
        methods:
          - POST
      plugins:
        - name: ai-proxy-multi
          enable: true
          config:
            fallback_strategy:
              - rate_limiting
            instances:
              - name: openai-instance
                provider: openai
                priority: 1
                weight: 0
                auth:
                  header:
                    Authorization: "Bearer your-api-key"
                options:
                  model: gpt-4
              - name: deepseek-instance
                provider: deepseek
                priority: 0
                weight: 0
                auth:
                  header:
                    Authorization: "Bearer your-api-key"
                options:
                  model: deepseek-chat
        - name: ai-rate-limiting
          enable: true
          config:
            rejected_code: 429
            rules:
              - key: "${http_x_subscription_id}"
                count: "${http_x_custom_count ?? 500}"
                time_window: 60
              - key: "${http_x_trial_id}"
                count: 50
                time_window: 60

将配置应用到集群：

kubectl apply -f ai-rate-limiting-ic.yaml

第一条规则使用 X-Subscription-ID 请求头部的值作为速率限制键，并根据 X-Custom-Count 头部动态设置请求限制。如果未提供该头部，则应用默认的 500 个令牌计数。第二条规则使用 X-Trial-ID 请求头部的值作为速率限制键，设置更严格的 50 个令牌限制。

要验证速率限制，使用相同的订阅 ID 向路由发送多个以下请求：

curl "http://127.0.0.1:9080/anything" -i -X POST \
  -H "X-Subscription-ID: sub-123456789" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

这些请求应匹配第一条规则，默认令牌计数为 500。你应该看到配额内的请求返回 HTTP/1.1 200 OK，而超出配额的请求返回 HTTP/1.1 429 Too Many Requests：

HTTP/1.1 200 OK
...
X-AI-1-RateLimit-Limit: 500
X-AI-1-RateLimit-Remaining: 499
X-AI-1-RateLimit-Reset: 60

HTTP/1.1 429 Too Many Requests
...
X-AI-1-RateLimit-Limit: 500
X-AI-1-RateLimit-Remaining: 0
X-AI-1-RateLimit-Reset: 5.871000051498

等待时间窗口重置。使用相同的订阅 ID 向路由发送多个以下请求，并将 X-Custom-Count 头部设置为 10：

curl "http://127.0.0.1:9080/anything" -i -X POST \
  -H "X-Subscription-ID: sub-123456789" \
  -H "X-Custom-Count: 10" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

这些请求应匹配第一条规则，自定义令牌计数为 10。你应该看到配额内的请求返回 HTTP/1.1 200 OK，而超出配额的请求返回 HTTP/1.1 429 Too Many Requests：

HTTP/1.1 200 OK
...
X-AI-1-RateLimit-Limit: 10
X-AI-1-RateLimit-Remaining: 9
X-AI-1-RateLimit-Reset: 60

HTTP/1.1 429 Too Many Requests
...
X-AI-1-RateLimit-Limit: 10
X-AI-1-RateLimit-Remaining: 0
X-AI-1-RateLimit-Reset: 40.422000169754

最后，使用试用 ID 向路由发送多个以下请求：

curl "http://127.0.0.1:9080/anything" -i -X POST \
  -H "X-Trial-ID: trial-123456789" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

这些请求应匹配第二条规则，令牌计数为 50。你应该看到配额内的请求返回 HTTP/1.1 200 OK，而超出配额的请求返回 HTTP/1.1 429 Too Many Requests：

HTTP/1.1 200 OK
...
X-AI-2-RateLimit-Limit: 50
X-AI-2-RateLimit-Remaining: 49
X-AI-2-RateLimit-Reset: 60

HTTP/1.1 429 Too Many Requests
...
X-AI-2-RateLimit-Limit: 50
X-AI-2-RateLimit-Remaining: 0
X-AI-2-RateLimit-Reset: 44

描述#

属性#

示例#

note

与 ai-proxy 一起应用速率限制#

对多个实例中的一个进行速率限制#

对所有实例应用相同配额#

配置实例优先级和速率限制#

按消费者进行负载均衡和速率限制#

note

按规则进行速率限制#

与 `ai-proxy` 一起应用速率限制#