제출 #792227: LM-Sys FastChat <= 0.2.36 Denial of Service (CWE-400)정보

제목	LM-Sys FastChat <= 0.2.36 Denial of Service (CWE-400)
설명	# Technical Details A Denial of Service (DoS) vulnerability exists in the model worker API endpoints (`/worker_generate` and `/worker_get_embeddings`) of FastChat due to synchronous blocking functions being executed directly on the main asyncio event loop thread. While commit `ff66426` patched this issue in `base_model_worker.py`'s `api_generate()` by wrapping the blocking inference call with `asyncio.to_thread()`, the fix was incomplete. Three other identical occurrences in `multi_model_worker.py`, `base_model_worker.py` (in `api_get_embeddings()`), and `huggingface_api_worker.py` were missed. # Vulnerable Code File: fastchat/serve/multi_model_worker.py, fastchat/serve/base_model_worker.py, fastchat/serve/huggingface_api_worker.py Method: api_generate, api_get_embeddings Why: These are `async def` FastAPI route handlers designed to be non-blocking. However, they directly execute intense, synchronous blocking logic—such as local GPU inference via `worker.generate_gate()` or `worker.get_embeddings()`, and synchronous network requests via `HuggingfaceApiWorker.generate_gate()`. Running these directly on the main thread entirely freezes the single-threaded asyncio loop for the duration of inference. # Reproduction 1. Start an instance of the FastChat back-end model worker (e.g., `base_model_worker.py`). 2. Run a concurrent health check (e.g., POST `/worker_get_status`) to establish a fast baseline (e.g., 10ms response). 3. Send an unauthenticated POST request to the vulnerable endpoint `/worker_get_embeddings` instructing the model to perform a slow inference task. 4. Immediately run the health check again. The health check will hang indefinitely until the inference task completes, demonstrating that the asyncio event loop is frozen and the server cannot process any concurrent connections, parse new requests, or respond to controller heartbeats. # Impact - Denial of Service (A single HTTP request freezes the model worker completely for the duration of inference). - Controller Deregistration (the frozen event loop prevents the worker from sending heartbeats, causing the controller to effectively kick the worker offline, bringing down all models currently hosted by that worker instance).
원천	⚠️ https://gist.github.com/YLChen-007/87216a2d97a882d619e11dc67cd473b5
사용자	Eric-f (UID 96873)
제출	2026. 03. 29. AM 05:42 (23 날 ago)
모더레이션	2026. 04. 19. PM 05:59 (22 days later)
상태	수락
VulDB 항목	358242 [lm-sys fastchat 까지 0.2.36 Worker API Endpoint api_generate 서비스 거부]
포인트들	20

◂ 이전 개요 다음 ▸

Interested in the pricing of exploits?

See the underground prices here!