إرسال #848737: exo-explore exo 1.0.71 Use of Weak Hashالمعلومات

عنوانexo-explore exo 1.0.71 Use of Weak Hash
الوصفCache Poisoning via Partial-Object Hashing in Vision Feature Cache Summary An attacker with the ability to submit vision inference requests to an exo cluster can craft an image whose raw pixel byte sequence matches another user's image but with different dimensions, causing feature-cache-key and prefix-cache content_hash reuse, which leads to the victim's inference silently using vision features computed from the attacker's image, negatively impacting all API clients sharing the same exo process. Affected Versions - Affected: exo version <= 0.3.70, main branch at and before commit 629c55d6ba201014ab45c48b0e9f984495a30f34 - Confirmed affected: exo version 0.3.70, commit 629c55d6ba201014ab45c48b0e9f984495a30f34 - Fixed: PR #2152 (https://github.com/exo-explore/exo/pull/2152), not yet released at the time of reporting Existing _feature_cache entries and KVPrefixCache entries generated by affected versions are in-memory only and do not persist across restarts. A process restart clears all vulnerable cache state. Details The vulnerability occurs because the MLX vision engine computes SHA-256 over PIL.Image.tobytes() output and then uses the resulting digest as both a feature cache key and a prefix cache content hash. However, tobytes() returns raw pixel bytes without any dimension information, so two images with different width×height but identical pixel byte sequences are treated as the same image. Where the Hash is Computed The following code is from version 0.3.70, commit 629c55d6ba201014ab45c48b0e9f984495a30f34. Feature cache key: # https://github.com/exo-explore/exo/blob/629c55d6ba201014ab45c48b0e9f984495a30f34/src/exo/worker/engines/mlx/vision.py#L739-L744 def _image_cache_key(self, images: list[Base64Image]) -> str: h = hashlib.sha256() for img in images: pil = decode_base64_image(img) h.update(pil.tobytes()) return h.hexdigest() Prefix cache content hash: # https://github.com/exo-explore/exo/blob/629c55d6ba201014ab45c48b0e9f984495a30f34/src/exo/worker/engines/mlx/vision.py#L711-L712 img = decode_base64_image(images[i]) region.content_hash = hashlib.sha256(img.tobytes()).hexdigest() Both call decode_base64_image(), which converts the image to RGB mode via img.convert("RGB") before returning. The conversion resolves palette-mode (P mode) ambiguity but does not embed dimension metadata into the byte stream. What Fields Are Included or Excluded The digest includes: - Raw RGB pixel byte sequence (PIL.Image.tobytes() after convert("RGB")) The digest excludes: - Image width - Image height - Original image mode (before conversion) - Color profile / ICC data - Any other PIL metadata The excluded dimension fields directly determine how the vision encoder preprocesses the image. The encoder resizes images by aspect ratio (e.g. 6×4 → 384×256 vs 4×6 → 256×384), producing semantically different feature tensors. Therefore, digest equality does not imply that the two images will produce equivalent vision features. How the Hash is Used for a Security-Relevant Decision Feature cache lookup: # https://github.com/exo-explore/exo/blob/629c55d6ba201014ab45c48b0e9f984495a30f34/src/exo/worker/engines/mlx/vision.py#L757-L766 cache_key = self._image_cache_key(images) cached = self._feature_cache.pop(cache_key, None) if cached is not None: self._feature_cache[cache_key] = cached image_features, n_tokens_per_image = cached else: image_features, n_tokens_per_image = self._encoder.encode_images(images) self._feature_cache[cache_key] = (image_features, n_tokens_per_image) The digest is used as the sole key in a process-level dict (_feature_cache, max 32 entries). A cache hit returns pre-computed vision features without re-encoding, and these features are fed directly into the model's generation pipeline. Prefix cache validation: # https://github.com/exo-explore/exo/blob/629c55d6ba201014ab45c48b0e9f984495a30f34/src/exo/worker/engines/mlx/cache.py#L417-L424 if query_r.content_hash != cached_r.content_hash: match_length = cached_r.start_pos break The content_hash is used to validate whether a prefix cache entry's media region corresponds to the same image. If the hashes match, the prefix cache reuses KV states computed from the cached image's features. Why Hash Equality Does Not Imply Security Equivalence The issue is not a raw cryptographic break of SHA-256. The issue is application-level hash confusion: the application treats digest equality as image equivalence, but the digest does not encode the spatial dimensions that determine how the image is preprocessed and encoded into features. Two images with identical pixel byte sequences but different width×height produce the same SHA-256 digest. However, the vision encoder's aspect-ratio-aware resize produces different feature tensors for each, so digest equality does not imply feature equivalence. How the Attacker Constructs a Conflicting Object The collision is deterministic and requires no brute-force. Given any W×H image where W ≠ H: Image A (victim): width=6, height=4, RGB pixels = [R0,G0,B0, R1,G1,B1, ..., R23,G23,B23] Image B (attacker): width=4, height=6, RGB pixels = [R0,G0,B0, R1,G1,B1, ..., R23,G23,B23] Both images have identical tobytes() output (72 bytes), therefore identical SHA-256 digest. But after aspect-ratio resize: - Image A (landscape 3:2) resizes to e.g. 384×256 - Image B (portrait 2:3) resizes to e.g. 256×384 The resulting feature tensors are semantically different. The attacker's requirement is knowledge of the victim's pixel byte sequence. In scenarios where images are predictable (standard test images, templated charts, well-known icons, shared reference images), this prerequisite is realistic. Version-Specific Behavior The _feature_cache and KVPrefixCache are process-level in-memory structures shared across all API requests within a single exo process. exo exposes OpenAI, Claude, and Ollama-compatible API endpoints, so multiple API clients may share the same cache. The vulnerability exists in all versions of exo that include the MLX vision engine with caching (version 0.3.70 and the main branch at commit 629c55d). Comparison with a Secure Path Fixed version (PR #2152 (https://github.com/exo-explore/exo/pull/2152)): # _image_cache_key — fixed def _image_cache_key(self, images: list[Base64Image]) -> str: h = hashlib.sha256() for img in images: pil = decode_base64_image(img) h.update(f"{pil.width}x{pil.height}".encode()) h.update(pil.tobytes()) return h.hexdigest() # content_hash — fixed img = decode_base64_image(images[i]) h = hashlib.sha256(f"{img.width}x{img.height}".encode()) h.update(img.tobytes()) region.content_hash = h.hexdigest() The fix prepends "{width}x{height}" to the hash input, ensuring dimension-different images always produce distinct digests. Impact This vulnerability allows attackers to: - Poison the vision feature cache: an attacker-submitted image with swapped dimensions occupies the cache slot, causing subsequent requests with the victim's image to receive incorrect vision features - Bypass prefix cache validation: the _validate_media_match check passes on colliding content_hash values, reusing KV states computed from incorrect image features - Corrupt model inference output: the model generates responses based on vision features from a different image, silently producing wrong answers for vision tasks The attack is transient: _feature_cache and KVPrefixCache are in-memory structures that are cleared on process restart. However, within a running process, poisoned entries persist until evicted by the LRU policy (max 32 feature cache entries).
المصدر⚠️ https://github.com/exo-explore/exo/issues/2151
المستخدم
 Dem0 (UID 82596)
ارسال05/06/2026 05:24 AM (30 أيام منذ)
الاعتدال04/07/2026 11:06 AM (29 days later)
الحالةتمت الموافقة
إدخال VulDB376321 [exo-explore exo حتى 1.0.71 Vision Feature Cache vision.py _image_cache_key تشفير ضعيف]
النقاط20

Are you interested in using VulDB?

Download the whitepaper to learn more about our service!