Soumettre #831452: PaddlePaddle FastDeploy 2.4.1_20260331_0 hash collisioninformation

TitrePaddlePaddle FastDeploy 2.4.1_20260331_0 hash collision
Description下面是翻译并润色后的英文漏洞报告版本: --- ## Vulnerability Report: Cache Key Collision in `MultimodalHasher.hash_features()` ### Summary `MultimodalHasher.hash_features()` computes a SHA-256 hash from `np.ndarray.tobytes()` and uses it as the cache key for multimodal caching. However, `tobytes()` only serializes the raw element bytes of a NumPy array and does **not** encode metadata such as the array’s `shape` or `dtype`. As a result, different arrays may produce the same hash value if their underlying byte representation is identical. This can lead to cache key collisions and incorrect cache hits. ### Issue Description The current implementation hashes multimodal features using only: ```python np.ndarray.tobytes() ``` This is insufficient because `tobytes()` does not include structural metadata. Specifically: 1. Arrays with different shapes but identical flattened bytes can produce the same hash. For example, arrays shaped `(6, 4)` and `(4, 6)` may hash to the same value. 2. Arrays with different dtypes may also collide if their raw byte patterns are identical, such as when memory is reinterpreted between `float32` and `uint8`. ### Proof of Concept ```python import numpy as np import hashlib base = np.arange(24, dtype=np.float32) a = base.reshape(6, 4) b = base.reshape(4, 6) assert hashlib.sha256(a.tobytes()).hexdigest() == hashlib.sha256(b.tobytes()).hexdigest() # True — different shapes, same hash ``` Although `a` and `b` have different shapes, they produce the same SHA-256 hash because the raw serialized bytes are identical. ### Impact This hash is used as a cache key across the three-level multimodal caching system: * `ProcessorCacheManager` — caches preprocessed pixel tensors * `EncoderCacheManager` — caches output features from the vision encoder * `PrefixCacheManager` — caches KV-block prefixes, where `mm_hash` is injected via `get_block_hash_extra_keys` A hash collision can cause an incorrect cache hit, resulting in stale or incorrect multimodal features being returned. This may lead to incorrect model outputs, cross-request contamination of multimodal representations, or hard-to-debug inference inconsistencies. ### Security Implications Because the cache key does not uniquely represent the full semantic identity of the NumPy array, an attacker or malformed input could craft different multimodal inputs that collide at the cache layer. This could cause the system to reuse cached features for a different input than intended. Depending on deployment context, this may affect: * correctness of multimodal model responses; * isolation between requests; * integrity of cached vision features; * reliability of prefix/KV cache reuse. ### Recommended Fix The hash computation should include array metadata in addition to raw bytes, especially: * `shape` * `dtype` * optionally memory layout/order information, such as C-contiguous or F-contiguous layout For example: ```python import hashlib def hash_array(arr): h = hashlib.sha256() h.update(str(arr.shape).encode()) h.update(str(arr.dtype).encode()) h.update(arr.tobytes()) return h.hexdigest() ``` A more robust implementation could use a structured serialization format or explicitly encode metadata using a stable binary format before hashing. ### Suggested Severity **Medium**, potentially **High** depending on whether the cache is shared across users, sessions, or requests. The vulnerability can cause incorrect cache hits and return features associated with a different multimodal input, which may compromise output integrity and request isolation.
La source⚠️ https://github.com/PaddlePaddle/FastDeploy/issues/7196
Utilisateur
 Dem0 (UID 82596)
Soumission16/05/2026 09:41 (il y a 19 jours)
Modérer04/06/2026 06:57 (19 days later)
StatutAccepté
Entrée VulDB368249 [PaddlePaddle FastDeploy jusqu’à 2.4.1 MultimodalHasher hasher.py hash_features chiffrement faible]
Points20

Do you need the next level of professionalism?

Upgrade your account now!