Soumettre #792227: LM-Sys FastChat <= 0.2.36 Denial of Service (CWE-400)information

TitreLM-Sys FastChat <= 0.2.36 Denial of Service (CWE-400)
Description # Technical Details A Denial of Service (DoS) vulnerability exists in the model worker API endpoints (`/worker_generate` and `/worker_get_embeddings`) of FastChat due to synchronous blocking functions being executed directly on the main asyncio event loop thread. While commit `ff66426` patched this issue in `base_model_worker.py`'s `api_generate()` by wrapping the blocking inference call with `asyncio.to_thread()`, the fix was incomplete. Three other identical occurrences in `multi_model_worker.py`, `base_model_worker.py` (in `api_get_embeddings()`), and `huggingface_api_worker.py` were missed. # Vulnerable Code File: fastchat/serve/multi_model_worker.py, fastchat/serve/base_model_worker.py, fastchat/serve/huggingface_api_worker.py Method: api_generate, api_get_embeddings Why: These are `async def` FastAPI route handlers designed to be non-blocking. However, they directly execute intense, synchronous blocking logic—such as local GPU inference via `worker.generate_gate()` or `worker.get_embeddings()`, and synchronous network requests via `HuggingfaceApiWorker.generate_gate()`. Running these directly on the main thread entirely freezes the single-threaded asyncio loop for the duration of inference. # Reproduction 1. Start an instance of the FastChat back-end model worker (e.g., `base_model_worker.py`). 2. Run a concurrent health check (e.g., POST `/worker_get_status`) to establish a fast baseline (e.g., 10ms response). 3. Send an unauthenticated POST request to the vulnerable endpoint `/worker_get_embeddings` instructing the model to perform a slow inference task. 4. Immediately run the health check again. The health check will hang indefinitely until the inference task completes, demonstrating that the asyncio event loop is frozen and the server cannot process any concurrent connections, parse new requests, or respond to controller heartbeats. # Impact - Denial of Service (A single HTTP request freezes the model worker completely for the duration of inference). - Controller Deregistration (the frozen event loop prevents the worker from sending heartbeats, causing the controller to effectively kick the worker offline, bringing down all models currently hosted by that worker instance).
La source⚠️ https://gist.github.com/YLChen-007/87216a2d97a882d619e11dc67cd473b5
Utilisateur
 Eric-f (UID 96873)
Soumission29/03/2026 05:42 (il y a 23 jours)
Modérer19/04/2026 17:59 (22 days later)
StatutAccepté
Entrée VulDB358242 [lm-sys fastchat jusqu’à 0.2.36 Worker API Endpoint api_generate déni de service]
Points20

Do you know our Splunk app?

Download it now for free!