Skip to main content
Version: 3.17

ai-aliyun-content-moderation

Description#

The ai-aliyun-content-moderation Plugin integrates with Aliyun Machine-Assisted Moderation Plus to check request and response content for risk level when proxying to LLMs, such as profanity, hate speech, insult, harassment, violence, and more, rejecting requests if the evaluated outcome exceeds the configured threshold.

Please ensure that the access_key_secret is correctly configured in the Plugin. If misconfigured, the moderation check will fail and the request may still be forwarded to the LLM Upstream. You will see a Specified signature is not matched with our calculation error in the gateway's error log from the Plugin.

The ai-aliyun-content-moderation Plugin should be used with either ai-proxy or ai-proxy-multi Plugin for proxying LLM requests.

Attributes#

NameTypeRequiredDefaultValid valuesDescription
access_key_idstringTrueAliyun access key ID.
access_key_secretstringTrueAliyun secret access key. The value is encrypted with AES before being stored in etcd.
region_idstringTrueAliyun region ID.
endpointstringTrueAliyun endpoint.
check_requestbooleanFalsetrueIf true, moderate the request content.
check_responsebooleanFalsefalseIf true, moderate the response content.
stream_check_modestringFalse"final_packet"realtime, final_packetStreaming moderation mode. realtime: batched checks during streaming. final_packet: append risk level at the end.
stream_check_cache_sizeintegerFalse128>= 1Maximum bytes per moderation batch in realtime mode. Length is measured using Lua string length, so for UTF-8 text non-ASCII characters may consume multiple bytes.
stream_check_intervalnumberFalse3>= 0.1Seconds between batch checks in realtime mode.
request_check_servicestringFalse"llm_query_moderation"Aliyun service for request moderation.
request_check_length_limitnumberFalse2000Request content length limit, in bytes. Length is measured using Lua string length, so for UTF-8 text non-ASCII characters may consume multiple bytes. If exceeded, the content will be sent in chunks. For instance, if the request content is 250 bytes and the request_check_length_limit is set to 100, then the content will be sent in 3 requests to Aliyun.
response_check_servicestringFalse"llm_response_moderation"Aliyun service for response moderation.
response_check_length_limitnumberFalse5000Response content length limit, in bytes. Length is measured using Lua string length, so for UTF-8 text non-ASCII characters may consume multiple bytes. If exceeded, the content will be sent in chunks. For instance, if the response content is 250 bytes and the response_check_length_limit is set to 100, then the content will be sent in 3 requests to Aliyun.
risk_level_barstringFalse"high"none, low, medium, high, maxIf the evaluated risk level is lower than the risk_level_bar, the request or response will be passed through to Upstream LLM or client respectively.
deny_codenumberFalse200Rejection HTTP status code.
deny_messagestringFalseRejection message.
timeoutintegerFalse10000>= 1Timeout in milliseconds.
keepalivebooleanFalsetrueIf true, enable HTTP connection keepalive to Aliyun.
keepalive_poolintegerFalse30>= 1Maximum number of connections in the keepalive pool.
keepalive_timeoutintegerFalse60000>= 1000Keepalive timeout in milliseconds.
ssl_verifybooleanFalsetrueIf true, enable SSL certificate verification.

Examples#

The following examples use OpenAI as the Upstream service provider. Before proceeding, create an OpenAI account and obtain an API key. If you are working with other LLM providers, please refer to the provider's documentation to obtain an API key.

Additionally, create an Aliyun account, enable Machine-Assisted Moderation Plus, and obtain the endpoint, region ID, access key ID, and access key secret.

note

You can fetch the admin_key from config.yaml and save to an environment variable with the following command:

admin_key=$(yq '.deployment.admin.admin_key[0].key' conf/config.yaml | sed 's/"//g')

You can optionally save the Aliyun and OpenAI information to environment variables:

# Replace with your data
export OPENAI_API_KEY=your-openai-api-key
export ALIYUN_ENDPOINT=https://green-cip.cn-shanghai.aliyuncs.com
export ALIYUN_REGION_ID=cn-shanghai
export ALIYUN_ACCESS_KEY_ID=your-aliyun-access-key-id
export ALIYUN_ACCESS_KEY_SECRET=your-aliyun-access-key-secret

Moderate Request Content Toxicity#

The following example demonstrates how you can use the Plugin to moderate content toxicity in requests and customize the rejection code and message.

Create a Route to the LLM chat completion endpoint using the ai-proxy Plugin and configure the integration details as well as the deny_code and deny_message in the ai-aliyun-content-moderation Plugin:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-aliyun-content-moderation-route",
"uri": "/anything",
"plugins": {
"ai-aliyun-content-moderation": {
"endpoint": "'"$ALIYUN_ENDPOINT"'",
"region_id": "'"$ALIYUN_REGION_ID"'",
"access_key_id": "'"$ALIYUN_ACCESS_KEY_ID"'",
"access_key_secret": "'"$ALIYUN_ACCESS_KEY_SECRET"'",
"deny_code": 400,
"deny_message": "Request contains forbidden content, such as hate speech or violence."
},
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
}
}
}
}'

Send a POST request to the Route with a system prompt and a user question with a profane word in the request body:

curl -i "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "Stupid, what is 1+1?" }
]
}'

You should receive an HTTP/1.1 400 Bad Request response and see the following message:

{
"object": "chat.completion",
"usage": {
"completion_tokens": 0,
"prompt_tokens": 0,
"total_tokens": 0
},
"choices": [
{
"message": {
"role": "assistant",
"content": "Request contains forbidden content, such as hate speech or violence."
},
"finish_reason": "stop",
"index": 0
}
],
"model": "gpt-4",
"id": "c9466bbf-e010-469d-949a-a10f25525964"
}

Send another request to the Route with a typical question in the request body:

curl -i "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

You should receive an HTTP/1.1 200 OK response with the model output:

{
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
]
}

Adjust Risk Level Threshold#

The following example demonstrates how you can adjust the threshold of risk level, which regulates whether a request or response should be allowed through.

Create a Route to the LLM chat completion endpoint using the ai-proxy Plugin and configure the risk_level_bar in ai-aliyun-content-moderation to be high:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-aliyun-content-moderation-route",
"uri": "/anything",
"plugins": {
"ai-aliyun-content-moderation": {
"endpoint": "'"$ALIYUN_ENDPOINT"'",
"region_id": "'"$ALIYUN_REGION_ID"'",
"access_key_id": "'"$ALIYUN_ACCESS_KEY_ID"'",
"access_key_secret": "'"$ALIYUN_ACCESS_KEY_SECRET"'",
"deny_code": 400,
"deny_message": "Request contains forbidden content, such as hate speech or violence.",
"risk_level_bar": "high"
},
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
}
}
}
}'

Send a POST request to the Route with a system prompt and a user question with a profane word in the request body:

curl -i "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "Stupid, what is 1+1?" }
]
}'

You should receive an HTTP/1.1 400 Bad Request response and see the following message:

{
"object": "chat.completion",
"usage": {
"completion_tokens": 0,
"prompt_tokens": 0,
"total_tokens": 0
},
"choices": [
{
"message": {
"role": "assistant",
"content": "Request contains forbidden content, such as hate speech or violence."
},
"finish_reason": "stop",
"index": 0
}
],
"model": "gpt-4",
"id": "c9466bbf-e010-469d-949a-a10f25525964"
}

Update the risk_level_bar in the Plugin to max:

curl "http://127.0.0.1:9180/apisix/admin/routes/ai-aliyun-content-moderation-route" -X PATCH \
-H "X-API-KEY: ${admin_key}" \
-d '{
"plugins": {
"ai-aliyun-content-moderation": {
"risk_level_bar": "max"
}
}
}'

Send the same request to the Route:

curl -i "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "Stupid, what is 1+1?" }
]
}'

You should receive an HTTP/1.1 200 OK response with the model output:

{
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
]
}

This is because the word "stupid" has a risk level of high, which is lower than the configured threshold of max.