CVE-2025-48944 in vLLMinfo

Summary

by MITRE • 05/30/2025

vLLM is an inference and serving engine for large language models (LLMs). In version 0.8.0 up to but excluding 0.9.0, the vLLM backend used with the /v1/chat/completions OpenAPI endpoint fails to validate unexpected or malformed input in the "pattern" and "type" fields when the tools functionality is invoked. These inputs are not validated before being compiled or parsed, causing a crash of the inference worker with a single request. The worker will remain down until it is restarted. Version 0.9.0 fixes the issue.

Be aware that VulDB is the high quality source for vulnerability data.

Analysis

by VulDB Data Team • 06/02/2025

The vulnerability identified as CVE-2025-48944 affects the vLLM inference engine, a critical component in large language model serving infrastructure that processes requests through the OpenAPI /v1/chat/completions endpoint. This issue specifically manifests when the tools functionality is invoked, creating a dangerous input validation gap that can be exploited to cause system-wide service disruption. The vulnerability exists in versions 0.8.0 through 0.8.9, representing a window of exposure where malicious or malformed inputs can trigger catastrophic failures in the inference worker processes.

The technical flaw stems from insufficient validation of the "pattern" and "type" fields within the tools invocation mechanism. When these fields contain unexpected or malformed data, the system fails to properly sanitize or validate the inputs before attempting to compile or parse them. This lack of input validation creates a direct path to a crash condition that terminates the inference worker process entirely. The vulnerability operates at the parsing layer where unvalidated user input is directly processed without proper error handling or sanitization mechanisms, making it particularly dangerous in production environments where service availability is critical.

The operational impact of this vulnerability is severe and directly affects system reliability and availability. A single malicious request can cause the inference worker to crash completely, requiring manual restart procedures that can take minutes to hours depending on the deployment architecture. This creates a denial of service condition that can severely impact applications relying on vLLM for language model inference, potentially causing cascading failures in larger AI application ecosystems. The vulnerability represents a classic case of insufficient input validation that can be exploited to cause system instability, with the crash occurring during the request processing phase rather than during initialization or configuration.

The root cause of this vulnerability aligns with CWE-20, which describes "Improper Input Validation" as a fundamental security weakness where input is not properly validated before being processed. This weakness creates conditions that allow attackers to inject malformed data that can cause unexpected behavior in the application. From an ATT&CK perspective, this vulnerability maps to T1499.004, "Utilities: System Shutdown/Reboot," as it can be leveraged to cause system downtime through worker process crashes. The vulnerability also relates to T1566.001, "Phishing: Spearphishing Attachment," as it could be exploited through crafted API requests that appear legitimate but contain malicious input patterns designed to trigger the crash condition.

The remediation for CVE-2025-48944 is straightforward but requires immediate attention to prevent exploitation. Organizations using vLLM versions between 0.8.0 and 0.8.9 should upgrade to version 0.9.0 or later where the input validation has been properly implemented. The fix involves implementing proper validation of the "pattern" and "type" fields before any compilation or parsing occurs, ensuring that malformed inputs are rejected rather than processed. Security teams should also consider implementing request rate limiting and input sanitization at the API gateway level as additional defensive measures. Monitoring for unusual patterns in API request processing and worker process restarts can help detect exploitation attempts, while regular security audits of the inference pipeline should verify that similar validation gaps do not exist in other input processing components.

Responsible

GitHub M

Reservation

05/28/2025

Disclosure

05/30/2025

Moderation

accepted

CPE

ready

EPSS

0.00449

KEV

no

Activities

very low

Sources

Do you need the next level of professionalism?

Upgrade your account now!