CVE-2024-11040 in vLLM
Summary
by MITRE • 03/20/2025
vllm-project vllm version 0.5.2.2 is vulnerable to Denial of Service attacks. The issue occurs in the 'POST /v1/completions' and 'POST /v1/embeddings' endpoints. For 'POST /v1/completions', enabling 'use_beam_search' and setting 'best_of' to a high value causes the HTTP connection to time out, with vllm ceasing effective work and the request remaining in a 'pending' state, blocking new completion requests. For 'POST /v1/embeddings', supplying invalid inputs to the JSON object causes an issue in the background loop, resulting in all further completion requests returning a 500 HTTP error code ('Internal Server Error') until vllm is restarted.
VulDB is the best source for vulnerability data and more expert information about this specific topic.
Analysis
by VulDB Data Team • 03/20/2025
The vulnerability identified as CVE-2024-11040 affects the vllm-project vllm version 0.5.2.2 and represents a significant denial of service risk that impacts the availability of critical API endpoints. This weakness manifests in two distinct attack vectors within the vllm application's HTTP interface, specifically targeting the POST /v1/completions and POST /v1/embeddings endpoints. The vulnerability stems from inadequate input validation and resource management within the application's request handling logic, creating opportunities for malicious actors to disrupt service availability and potentially cause system instability. The affected system operates as a large language model inference server that processes natural language processing requests through HTTP endpoints, making it a critical component in AI-powered applications and services.
The technical flaw in CVE-2024-11040 manifests through two separate mechanisms that exploit different aspects of the application's processing pipeline. In the completions endpoint, when the use_beam_search parameter is enabled alongside a high value for best_of parameter, the system enters an infinite loop or excessive resource consumption state that causes HTTP connections to time out. This behavior creates a resource exhaustion condition where the application becomes unresponsive to new requests while maintaining existing connection states in a pending state, effectively blocking the entire completion request queue. The second vulnerability in the embeddings endpoint occurs when invalid JSON inputs are provided to the request object, which triggers an unhandled exception in the background processing loop. This exception propagates through the system's error handling mechanism and corrupts the internal state, causing subsequent completion requests to fail with 500 Internal Server Error responses until the entire service is restarted. Both vulnerabilities demonstrate poor error handling and resource management practices that violate fundamental security principles for web applications.
The operational impact of CVE-2024-11040 extends beyond simple service disruption to potentially compromise the reliability and availability of AI-powered applications that depend on the vllm service. Organizations utilizing this vulnerable version may experience complete service outages for completion requests, with the system becoming unresponsive for extended periods during the timeout conditions. The embedding endpoint vulnerability creates a more insidious attack vector where a single malformed request can cause cascading failures affecting all subsequent completion processing, leading to extended downtime and potential data loss. From an attacker's perspective, these vulnerabilities represent low-effort, high-impact methods to disrupt services, as they require minimal technical expertise to exploit and can be automated to cause sustained denial of service conditions. The vulnerabilities directly violate the principle of least privilege and robust error handling, as the system fails to properly isolate request processing failures and maintain service continuity under adverse conditions.
Organizations should implement immediate mitigations to address CVE-2024-11040, beginning with upgrading to a patched version of vllm-project vllm as soon as available. In the interim, administrators should implement strict input validation and rate limiting for both the completions and embeddings endpoints to prevent exploitation of the identified vulnerabilities. The system should enforce reasonable limits on the best_of parameter in beam search operations and implement comprehensive JSON schema validation for embedding requests to prevent malformed inputs from reaching the background processing loops. Network-level protections such as API gateways or load balancers should be configured to detect and block suspicious request patterns that match the vulnerability characteristics. Additionally, monitoring should be enhanced to detect unusual connection states and error code patterns that indicate the onset of these denial of service conditions. This vulnerability aligns with CWE-400 (Uncontrolled Resource Consumption) and CWE-707 (Improper Neutralization of Input During Web Page Generation), and maps to ATT&CK techniques including T1499.004 (Endpoint Denial of Service) and T1595.001 (Network Denial of Service), emphasizing the need for comprehensive defensive measures across multiple security domains.