Gửi #792227: LM-Sys FastChat <= 0.2.36 Denial of Service (CWE-400)thông tin

tiêu đềLM-Sys FastChat <= 0.2.36 Denial of Service (CWE-400)
Mô tả # Technical Details A Denial of Service (DoS) vulnerability exists in the model worker API endpoints (`/worker_generate` and `/worker_get_embeddings`) of FastChat due to synchronous blocking functions being executed directly on the main asyncio event loop thread. While commit `ff66426` patched this issue in `base_model_worker.py`'s `api_generate()` by wrapping the blocking inference call with `asyncio.to_thread()`, the fix was incomplete. Three other identical occurrences in `multi_model_worker.py`, `base_model_worker.py` (in `api_get_embeddings()`), and `huggingface_api_worker.py` were missed. # Vulnerable Code File: fastchat/serve/multi_model_worker.py, fastchat/serve/base_model_worker.py, fastchat/serve/huggingface_api_worker.py Method: api_generate, api_get_embeddings Why: These are `async def` FastAPI route handlers designed to be non-blocking. However, they directly execute intense, synchronous blocking logic—such as local GPU inference via `worker.generate_gate()` or `worker.get_embeddings()`, and synchronous network requests via `HuggingfaceApiWorker.generate_gate()`. Running these directly on the main thread entirely freezes the single-threaded asyncio loop for the duration of inference. # Reproduction 1. Start an instance of the FastChat back-end model worker (e.g., `base_model_worker.py`). 2. Run a concurrent health check (e.g., POST `/worker_get_status`) to establish a fast baseline (e.g., 10ms response). 3. Send an unauthenticated POST request to the vulnerable endpoint `/worker_get_embeddings` instructing the model to perform a slow inference task. 4. Immediately run the health check again. The health check will hang indefinitely until the inference task completes, demonstrating that the asyncio event loop is frozen and the server cannot process any concurrent connections, parse new requests, or respond to controller heartbeats. # Impact - Denial of Service (A single HTTP request freezes the model worker completely for the duration of inference). - Controller Deregistration (the frozen event loop prevents the worker from sending heartbeats, causing the controller to effectively kick the worker offline, bringing down all models currently hosted by that worker instance).
Nguồn⚠️ https://gist.github.com/YLChen-007/87216a2d97a882d619e11dc67cd473b5
Người dùng
 Eric-f (UID 96873)
Đệ trình29/03/2026 05:42 (cách đây 23 ngày)
Kiểm duyệt19/04/2026 17:59 (22 days later)
Trạng tháiđược chấp nhận
Mục VulDB358242 [lm-sys fastchat đến 0.2.36 Worker API Endpoint api_generate Từ chối dịch vụ]
điểm20

Do you need the next level of professionalism?

Upgrade your account now!