提交 #831455: modelscope ms-swift 4.2.0 Hash Collision信息

标题modelscope ms-swift 4.2.0 Hash Collision
描述mage Cache Hash Collision via Missing Dimension Metadata | Field | Value | |-------|-------| | **Advisory ID** | SWIFT-2026-001 | | **Affected Component** | `swift/template/base.py` → `Template._save_pil_image()` | | **Affected Versions** | All versions up to and including commit `8d1e071d5` | | **Severity** | Medium (CVSS 3.1 Base: 5.3) | | **Attack Vector** | Network (user-supplied image input) | | **CWE** | CWE-345 (Insufficient Verification of Data Authenticity) | | **Status** | Open | --- ## Summary `Template._save_pil_image()` uses `SHA256(image.tobytes())` as the cache key for content-addressed image storage. However, PIL's `tobytes()` returns a **flat byte stream without any dimensional metadata** (width, height, mode). Two visually distinct images with identical raw pixel bytes but different dimensions will produce the same hash, causing cache collision. The first image cached will be silently served for all subsequent images sharing the same hash — resulting in incorrect model input. --- ## Affected Code **File:** `swift/template/base.py`, lines 697–707 ```python @staticmethod def _save_pil_image(image: Image.Image) -> str: img_bytes = image.tobytes() # ← no dimensional metadata img_hash = hashlib.sha256(img_bytes).hexdigest() # ← hash of raw pixels only tmp_dir = os.path.join(get_cache_dir(), 'tmp', 'images') logger.info_once(f'create tmp_dir: {tmp_dir}') os.makedirs(tmp_dir, exist_ok=True) img_path = os.path.join(tmp_dir, f'{img_hash}.png') if not os.path.exists(img_path): image.save(img_path) # ← only first image is saved return img_path # ← all collisions get this path ``` **Call site:** `swift/template/base.py`, lines 305–308 ```python if images and not load_images_origin: # fix pt & qwen-vl for i, image in enumerate(images): if isinstance(image, Image.Image): images[i] = self._save_pil_image(image) ``` --- ## Root Cause `PIL.Image.tobytes()` serializes pixel data as a contiguous byte array: ``` Output: R₁G₁B₁ R₂G₂B₂ R₃G₃B₃ ... (W × H × 3 bytes for RGB) ``` This output does **not** encode: - **Width (W)** and **Height (H)** — only the product `W × H` is implied by the byte count - **Image mode** — while the upstream pipeline normalizes to RGB, the function itself does not enforce this Therefore, for any pair of images where `W₁ × H₁ == W₂ × H₂` and the flat pixel sequences are identical, the SHA-256 digest will collide. --- ## Proof of Concept ### Minimal Reproduction ```python """ PoC: Demonstrate hash collision in _save_pil_image() Two visually DIFFERENT images produce the same SHA256 cache key. Run this script — it will assert that both images share the same hash while being visually distinct (horizontal stripes vs diagonal pattern). """ import hashlib import numpy as np from PIL import Image # ── Step 1: Construct raw pixel data (shared by both images) ────────── # Total pixels: 120 × 80 = 80 × 120 = 9600 pixels = 28800 bytes (RGB) W_A, H_A = 120, 80 # landscape W_B, H_B = 80, 120 # portrait assert W_A * H_A == W_B * H_B # same total pixel count total_pixels = W_A * H_A pixels = np.zeros((total_pixels, 3), dtype=np.uint8) # Paint horizontal stripes (every 120 pixels = one row of image A) for i in range(total_pixels): row_in_A = i // W_A if row_in_A % 10 < 5: pixels[i] = [255, 60, 60] # red stripe else: pixels[i] = [60, 60, 255] # blue stripe raw_bytes = pixels.tobytes() # ── Step 2: Create two images from the SAME bytes ──────────────────── img_a = Image.frombytes('RGB', (W_A, H_A), raw_bytes) # 120×80 img_b = Image.frombytes('RGB', (W_B, H_B), raw_bytes) # 80×120 # ── Step 3: Verify hash collision ───────────────────────────────────── hash_a = hashlib.sha256(img_a.tobytes()).hexdigest() hash_b = hashlib.sha256(img_b.tobytes()).hexdigest() assert hash_a == hash_b, "Hashes should collide!" print(f"[COLLISION] Both images produce the same SHA-256:") print(f" Image A: {W_A}×{H_A} hash={hash_a[:16]}...") print(f" Image B: {W_B}×{H_B} hash={hash_b[:16]}...") # ── Step 4: Simulate _save_pil_image cache behavior ────────────────── # First call: img_a gets cached img_a.save('/tmp/poc_image_a_120x80.png') # Second call: img_b hits cache, gets img_a's file # (in _save_pil_image, the `if not os.path.exists` check skips saving img_b) img_b.save('/tmp/poc_image_b_80x120.png') print(f"\n[VISUAL DIFF] Open both files to see the difference:") print(f" Image A (120×80): /tmp/poc_image_a_120x80.png → clean horizontal stripes") print(f" Image B (80×120): /tmp/poc_image_b_80x120.png → diagonal/broken pattern") print(f"\nIn production, image B would NEVER be saved.") print(f"The model would receive image A when image B was submitted.") # ── Step 5: Verify they are visually different ──────────────────────── arr_a = np.array(img_a) # shape (80, 120, 3) arr_b = np.array(img_b) # shape (120, 80, 3) assert arr_a.shape != arr_b.shape, "Shapes must differ" print(f"\n[CONFIRMED] Shape A={arr_a.shape}, Shape B={arr_b.shape}") print(f"[CONFIRMED] Pixel arrays are NOT equivalent — images are visually distinct.") ``` ### Expected Output ``` [COLLISION] Both images produce the same SHA-256: Image A: 120×80 hash=a1b2c3d4e5f6a7b8... Image B: 80×120 hash=a1b2c3d4e5f6a7b8... [VISUAL DIFF] Open both files to see the difference: Image A (120×80): /tmp/poc_image_a_120x80.png → clean horizontal stripes Image B (80×120): /tmp/poc_image_b_80x120.png → diagonal/broken pattern In production, image B would NEVER be saved. The model would receive image A when image B was submitted. [CONFIRMED] Shape A=(80, 120, 3), Shape B=(120, 80, 3) [CONFIRMED] Pixel arrays are NOT equivalent — images are visually distinct. ``` ### Visual Comparison ``` Image A (120×80) — Clean stripes: Image B (80×120) — Broken pattern: ████████████████████████ ████████████████ ████████████████████████ ████████████░░░░ ████████████████████████ ░░░░░░░░░░░░████ ████████████████████████ ████████████████ ████████████████████████ ████████░░░░░░░░ ░░░░░░░░░░░░░░░░░░░░░░░░ ░░░░░░░░████████ ░░░░░░░░░░░░░░░░░░░░░░░░ ████████████████ ░░░░░░░░░░░░░░░░░░░░░░░░ ████░░░░░░░░░░░░ ░░░░░░░░░░░░░░░░░░░░░░░░ ░░░░░░░░░░░░████ ░░░░░░░░░░░░░░░░░░░░░░░░ ████████████████ Same bytes, different row width → completely different visual layout ``` --- ## Impact ### 1. Inference Cache Poisoning A shared inference service caches image A. A subsequent request with a crafted image B (same pixel bytes, different dimensions) hits the cache and receives image A. The multimodal model generates a response based on the **wrong image**. ### 2. GRPO / RLHF Training Data Corruption In reinforcement learning training pipelines (GRPO, DPO, RLHF), if two training samples contain images that collide: - Sample 2's image is silently replaced with Sample 1's cached image - The reward model scores based on the wrong image-text pairing - The policy model learns from corrupted reward signals ### 3. Dataset Deduplication False Positives If image hashes are used for deduplication, visually distinct images are incorrectly identified as duplicates and removed, reducing effective training data. --- ## Remediation ### Recommended Fix Include image dimensions and mode in the hash input: ```python @staticmethod def _save_pil_image(image: Image.Image) -> str: # Fix: include dimensional metadata to prevent cross-dimension collision img_meta = f"{image.mode}:{image.width}:{image.height}:".encode() img_hash = hashlib.sha256(img_meta + image.tobytes()).hexdigest() tmp_dir = os.path.join(get_cache_dir(), 'tmp', 'images') logger.info_once(f'create tmp_dir: {tmp_dir}') os.makedirs(tmp_dir, exist_ok=True) img_path = os.path.join(tmp_dir, f'{img_hash}.png') if not os.path.exists(img_path): image.save(img_path) return img_path ``` ### Why This Fix Is Sufficient After the upstream `convert('RGB')` + `rescale_image` pipeline: - `mode` is always `'RGB'` — including it provides defense-in-depth against future call paths that skip `convert('RGB')` - `width` and `height` are the **only** remaining ambiguity in `tobytes()` output - Adding these as a prefix to the hash input makes the key fully deterministic for a given visual image ### Migration This change will **invalidate existing cached images** (hash values change). The cache directory (`{ca
来源⚠️ https://github.com/modelscope/ms-swift/issues/9360
用户
 Dem0 (UID 82596)
提交2026-05-16 10時17分 (19 日前)
管理2026-06-04 06時59分 (19 days later)
状态已接受
VulDB条目368250 [modelscope ms-swift 直到 4.2.0 PIL Image Cache Key swift/template/base.py Template._save_pil_image 弱加密]
积分20

Want to know what is going to be exploited?

We predict KEV entries!