提出 #792227: LM-Sys FastChat <= 0.2.36 Denial of Service (CWE-400)情報

タイトル	LM-Sys FastChat <= 0.2.36 Denial of Service (CWE-400)
説明	# Technical Details A Denial of Service (DoS) vulnerability exists in the model worker API endpoints (`/worker_generate` and `/worker_get_embeddings`) of FastChat due to synchronous blocking functions being executed directly on the main asyncio event loop thread. While commit `ff66426` patched this issue in `base_model_worker.py`'s `api_generate()` by wrapping the blocking inference call with `asyncio.to_thread()`, the fix was incomplete. Three other identical occurrences in `multi_model_worker.py`, `base_model_worker.py` (in `api_get_embeddings()`), and `huggingface_api_worker.py` were missed. # Vulnerable Code File: fastchat/serve/multi_model_worker.py, fastchat/serve/base_model_worker.py, fastchat/serve/huggingface_api_worker.py Method: api_generate, api_get_embeddings Why: These are `async def` FastAPI route handlers designed to be non-blocking. However, they directly execute intense, synchronous blocking logic—such as local GPU inference via `worker.generate_gate()` or `worker.get_embeddings()`, and synchronous network requests via `HuggingfaceApiWorker.generate_gate()`. Running these directly on the main thread entirely freezes the single-threaded asyncio loop for the duration of inference. # Reproduction 1. Start an instance of the FastChat back-end model worker (e.g., `base_model_worker.py`). 2. Run a concurrent health check (e.g., POST `/worker_get_status`) to establish a fast baseline (e.g., 10ms response). 3. Send an unauthenticated POST request to the vulnerable endpoint `/worker_get_embeddings` instructing the model to perform a slow inference task. 4. Immediately run the health check again. The health check will hang indefinitely until the inference task completes, demonstrating that the asyncio event loop is frozen and the server cannot process any concurrent connections, parse new requests, or respond to controller heartbeats. # Impact - Denial of Service (A single HTTP request freezes the model worker completely for the duration of inference). - Controller Deregistration (the frozen event loop prevents the worker from sending heartbeats, causing the controller to effectively kick the worker offline, bringing down all models currently hosted by that worker instance).
ソース	⚠️ https://gist.github.com/YLChen-007/87216a2d97a882d619e11dc67cd473b5
ユーザー	Eric-f (UID 96873)
送信	2026年03月29日 05:42 (24 日 ago)
モデレーション	2026年04月19日 17:59 (22 days later)
ステータス	承諾済み
VulDBエントリ	358242 [lm-sys fastchat 迄 0.2.36 Worker API Endpoint api_generate サービス拒否]
ポイント	20

◂ 前概要次 ▸

Are you interested in using VulDB?

Download the whitepaper to learn more about our service!