Gửi #792228: LM-Sys FastChat <= 0.2.36 Content Moderation Bypass (CWE-670)thông tin

tiêu đềLM-Sys FastChat <= 0.2.36 Content Moderation Bypass (CWE-670)
Mô tả # Technical Details A Content Moderation Bypass vulnerability exists in multiple Arena modes of the FastChat platform (such as Anonymous Arena and Vision Arenas). The content moderation filter fails to correctly build the conversation context string before passing it to `moderation_filter()`. In two modules, a copy-paste error causes the application to read `states[0].conv.get_prompt()` twice instead of reading Model B's state at `states[1]`. In a third module (Vision Anonymous Arena), the entire conversation context parameter is mistakenly replaced with the current user's short input message (`text`), meaning no dialogue history is checked. # Vulnerable Code File: fastchat/serve/gradio_block_arena_anony.py, fastchat/serve/gradio_block_arena_vision_named.py, fastchat/serve/gradio_block_arena_vision_anony.py Method: add_text Why: 1. In `gradio_block_arena_anony.py` and `gradio_block_arena_vision_named.py`, `all_conv_text_right = states[0].conv.get_prompt()` is used instead of `states[1]`, resulting in Model B's generated history being completely omitted from `all_conv_text`. 2. In `gradio_block_arena_vision_anony.py`, `moderate_input(state0, text, text, ...)` is called where the third argument is supposed to be `all_conv_text`. Because it passes `text` (just the current prompt), the moderation filter never sees any history. # Reproduction 1. Navigate to the FastChat Anonymous Arena (Battle) or Vision Anonymous Arena web interface. 2. Initiate a multi-turn conversation. First, prompt both models to generate content that may be close to crossing moderation boundaries but passes due to context. 3. In the second turn, send a short, benign follow-up prompt (e.g., "tell me more" or "continue"). 4. The backend evaluates the short prompt string without Model B's prior context (or without any context in Vision Anonymous), effectively allowing severe policy-violating context to persist indefinitely because the moderation filter drops the history window. # Impact - Content Moderation Bypass (Complete blindness to Model B's history in some arena layouts, and complete blindness to all history in Vision Anonymous Arena). - Safety Policy Violations (Users can generate unrestricted harmful text or imagery continuously across multi-turn sessions safely bypassing filters).
Nguồn⚠️ https://gist.github.com/YLChen-007/e45039d23e698222d887ee09735d9d36
Người dùng
 Eric-f (UID 96873)
Đệ trình29/03/2026 05:43 (cách đây 23 ngày)
Kiểm duyệt19/04/2026 17:59 (22 days later)
Trạng tháiđược chấp nhận
Mục VulDB358243 [lm-sys fastchat đến 0.2.36 Arena Side-by-Side View add_text nâng cao đặc quyền]
điểm20

Want to stay up to date on a daily basis?

Enable the mail alert feature now!