| Titel | LM-Sys FastChat <= 0.2.36 Denial of Service (CWE-400) |
|---|
| Beschreibung |
# Technical Details
A Denial of Service (DoS) vulnerability exists in the model worker API endpoints (`/worker_generate` and `/worker_get_embeddings`) of FastChat due to synchronous blocking functions being executed directly on the main asyncio event loop thread.
While commit `ff66426` patched this issue in `base_model_worker.py`'s `api_generate()` by wrapping the blocking inference call with `asyncio.to_thread()`, the fix was incomplete. Three other identical occurrences in `multi_model_worker.py`, `base_model_worker.py` (in `api_get_embeddings()`), and `huggingface_api_worker.py` were missed.
# Vulnerable Code
File: fastchat/serve/multi_model_worker.py, fastchat/serve/base_model_worker.py, fastchat/serve/huggingface_api_worker.py
Method: api_generate, api_get_embeddings
Why: These are `async def` FastAPI route handlers designed to be non-blocking. However, they directly execute intense, synchronous blocking logic—such as local GPU inference via `worker.generate_gate()` or `worker.get_embeddings()`, and synchronous network requests via `HuggingfaceApiWorker.generate_gate()`. Running these directly on the main thread entirely freezes the single-threaded asyncio loop for the duration of inference.
# Reproduction
1. Start an instance of the FastChat back-end model worker (e.g., `base_model_worker.py`).
2. Run a concurrent health check (e.g., POST `/worker_get_status`) to establish a fast baseline (e.g., 10ms response).
3. Send an unauthenticated POST request to the vulnerable endpoint `/worker_get_embeddings` instructing the model to perform a slow inference task.
4. Immediately run the health check again. The health check will hang indefinitely until the inference task completes, demonstrating that the asyncio event loop is frozen and the server cannot process any concurrent connections, parse new requests, or respond to controller heartbeats.
# Impact
- Denial of Service (A single HTTP request freezes the model worker completely for the duration of inference).
- Controller Deregistration (the frozen event loop prevents the worker from sending heartbeats, causing the controller to effectively kick the worker offline, bringing down all models currently hosted by that worker instance). |
|---|
| Quelle | ⚠️ https://gist.github.com/YLChen-007/87216a2d97a882d619e11dc67cd473b5 |
|---|
| Benutzer | Eric-f (UID 96873) |
|---|
| Einreichung | 29.03.2026 05:42 (vor 23 Tagen) |
|---|
| Moderieren | 19.04.2026 17:59 (22 days later) |
|---|
| Status | Akzeptiert |
|---|
| VulDB Eintrag | 358242 [lm-sys fastchat bis 0.2.36 Worker API Endpoint api_generate Denial of Service] |
|---|
| Punkte | 20 |
|---|