| विवरण | ## Cache Poisoning via Partial-Object Hashing in Multimodal Embedding Cache
### Summary
An attacker with the ability to submit multimodal inputs or `precomputed_embedding` payloads can craft non-equivalent multimodal features that produce the same application-level cache key, causing cache-key reuse, which leads to incorrect embedding reuse and potential denial of service, negatively impacting users sharing the same SGLang multimodal embedding cache.
The violated security assumption is that equality of the multimodal hash implies equality of the underlying multimodal object. That assumption is false because the hash omits shape, dtype, tensor-boundary, and serialization metadata, and because part of the cache key is produced with Python's process-dependent `hash()`.
### Affected Versions
- **Confirmed affected:** `sglang` version `0.5.12.dev476+g4df42da65`, reported in issue `#22032`; vulnerable source behavior confirmed at commit `56ac9c9932be44558803d60325a6a175dc6fe04d`
- **Other versions:** Not verified
- **Fixed:** local/source fix commit `bc3899b125760df073480f29eb73fca1a2b4cdac`; released fixed version not available at the time of reporting
- **Introduced in:** Not verified
Existing multimodal cache entries generated with the vulnerable digest scheme may remain unsafe after upgrading unless the cache namespace is invalidated or the digest schema is versioned.
### Details
The vulnerability occurs because SGLang computes multimodal feature hashes over raw tensor or ndarray bytes and then uses the resulting values as cache identity for multimodal embeddings. However, the digest does not include all fields required to determine security equivalence, such as tensor shape, dtype, list boundaries, feature format, model/cache schema, and trust-domain information. As a result, two distinct multimodal inputs can be treated as the same cached object.
**Where the Hash is Computed**
The following code is from vulnerable commit `56ac9c9932be44558803d60325a6a175dc6fe04d`.
```python
# https://github.com/sgl-project/sglang/blob/56ac9c9932be44558803d60325a6a175dc6fe04d/python/sglang/srt/managers/mm_utils.py#L1314-L1367
def data_hash(data) -> int:
hash_bytes = hashlib.sha256(data).digest()[:8]
return int.from_bytes(hash_bytes, byteorder="big", signed=False)
def tensor_hash(tensor_list) -> int:
tensor = tensor_list
if isinstance(tensor_list, list):
tensor_list = flatten_nested_list(tensor_list)
tensors = [
x.flatten() if isinstance(x, torch.Tensor) else x for x in tensor_list
]
if any(isinstance(t, torch.Tensor) and t.is_cuda for t in tensors):
tensor = torch.concat(tensors)
return gpu_tensor_hash(tensor.cuda())
hasher = hashlib.sha256()
for t in tensors:
t = t.detach().contiguous()
hasher.update(memoryview(t.view(torch.uint8).numpy()))
hash_bytes = hasher.digest()[:8]
return int.from_bytes(hash_bytes, byteorder="big", signed=False)
if tensor.is_cuda:
return gpu_tensor_hash(tensor.cuda())
tensor = tensor.detach().contiguous()
hasher = hashlib.sha256()
hasher.update(memoryview(tensor.view(torch.uint8).numpy()))
hash_bytes = hasher.digest()[:8]
return int.from_bytes(hash_bytes, byteorder="big", signed=False)
def hash_feature(f):
if isinstance(f, list):
if isinstance(f[0], torch.Tensor):
return tensor_hash(f)
return data_hash(tuple(flatten_nested_list(f)))
elif isinstance(f, np.ndarray):
arr = np.ascontiguousarray(f)
hasher = hashlib.sha256()
hasher.update(memoryview(arr))
hash_bytes = hasher.digest()[:8]
return int.from_bytes(hash_bytes, byteorder="big", signed=False)
elif isinstance(f, torch.Tensor):
return tensor_hash([f])
elif isinstance(f, CudaIpcTensorTransportProxy):
reconstruct_t = f.reconstruct_on_target_device(torch.cuda.current_device())
return tensor_hash([reconstruct_t])
return data_hash(f)
```
The digest is computed from raw feature bytes only. It is truncated to 64 bits with `digest()[:8]`. Tensor-list boundaries are lost by flattening each tensor and hashing only the concatenated byte stream. The ndarray path hashes `memoryview(arr)` without shape or dtype metadata. The non-tensor list path passes a `tuple` to `data_hash`, which calls `hashlib.sha256()` and raises `TypeError` because the input is not bytes-like.
**What Fields Are Included or Excluded**
The digest includes:
- Raw tensor or ndarray bytes
- For CPU tensor lists, the byte sequence after flattening each tensor
- For CUDA tensor lists, the byte sequence after `torch.concat()` before `gpu_tensor_hash()`
The digest excludes:
- Tensor shape
- Tensor dtype
- Per-tensor list boundaries
- ndarray shape
- ndarray dtype
- Multimodal input format, such as raw image/video/audio, `processor_output`, or `precomputed_embedding`
- Model identifier and processor version
- Cache or digest schema version
- Tenant, user, request, or trust-domain namespace
The excluded fields affect whether two multimodal features are safe to reuse interchangeably. Shape, dtype, and tensor boundaries affect model semantics; format and processor version affect provenance; cache schema and trust-domain namespace affect whether cross-request or cross-tenant reuse is safe.
**How the Hash is Used for a Security-Relevant Decision**
The feature hash is assigned to each multimodal data item and later used as the cache key for embedding reuse.
```python
# https://github.com/sgl-project/sglang/blob/56ac9c9932be44558803d60325a6a175dc6fe04d/python/sglang/srt/managers/schedule_batch.py#L277-L299
def set_pad_value(self):
"""
Set the pad value after first hashing the data
"""
if self.pad_value is not None:
return
from sglang.srt.managers.mm_utils import hash_feature
if envs.SGLANG_MM_SKIP_COMPUTE_HASH.get():
import uuid
self.hash = uuid.uuid4().int
self.pad_value = _compute_pad_value(self.hash)
return
if self.hash is None:
if self.feature is not None:
hashed_feature = self.feature
else:
hashed_feature = self.precomputed_embeddings
self.hash = hash_feature(hashed_feature)
assert self.hash is not None
self.pad_value = _compute_pad_value(self.hash)
```
```python
# https://github.com/sgl-project/sglang/blob/56ac9c9932be44558803d60325a6a175dc6fe04d/python/sglang/srt/mem_cache/multimodal_cache.py#L17-L24
@staticmethod
def combine_hashes(mm_hashes: List[int]) -> Optional[int]:
"""
Get a combined hash from individual mm item hashes
"""
if not mm_hashes:
return None
return hash(tuple(mm_hashes))
```
```python
# https://github.com/sgl-project/sglang/blob/56ac9c9932be44558803d60325a6a175dc6fe04d/python/sglang/srt/mem_cache/multimodal_cache.py#L91-L120
def get(
self, mm_hashes: List[int], combined_hash: Optional[int] = None
) -> Optional[EmbeddingResult]:
combined_hash = self.combine_hashes(mm_hashes)
embedding = self.mm_cache.get(combined_hash)
if embedding is not None:
self.mm_cache.move_to_end(combined_hash)
return embedding
def set(
self,
mm_hash: int,
embedding: EmbeddingResult,
loc: Optional[torch.Tensor] = None,
) -> bool:
assert isinstance(embedding, EmbeddingResult), embedding
if mm_hash in self.mm_cache:
self.mm_cache.move_to_end(mm_hash)
return True
data_size = _get_tensor_size(embedding.embedding)
while self.current_size + data_size > self.max_size:
if not self.mm_cache:
return False
lru_hash, lru_embedding = self.mm_cache.popitem(last=False)
self.current_size -= _get_tensor_size(lru_embedding.embedding)
self.mm_cache[mm_hash] = embedding
self.current_size += data_size
return True
```
The digest is used as the cache key for sensitive model embedding results. Therefore, two non-equivalent multimodal inputs that share the same application-level digest can retrieve or overwrite the same cached embedding. In addition, `combine_hashes()` uses Python's built-in `hash()`, which is process-dependent under `PYTHONHASHSEED` randomization and can cause identical inputs to map to different combined keys across independently started workers.
**Why Hash Equality Does Not Imply Security Equivalence**
The issue is not a raw cryptographic break of SHA-256. The issue is application-level hash confusion: the application treats digest equality as object equivalence, but the digest does not encode all fields required to determine security equivalence.
For partial-object hashing, two tensor lists with the same flattened bytes but different tensor boundaries produce the same digest. For non-canonical serialization, two ndarrays with the same bytes but different shapes produce the same digest. For truncated digest collision, the SHA-256 result is reduced to 64 bits, making accidental or attacker-generated collisions more practical at high request volumes. For non-cryptographic hashing, `hash(tuple(mm_hashes))` is not stable across independent Python processes.
**How the Attacker Constructs a Conflicting Object**
Two tensor-list inputs can be semantically distinct while sharing the same byte stream after flattening:
```python
victim_feature = [
torch.tensor([1.0, 2.0]),
torch.tensor([3.0, 4.0, 5.0]),
]
attacker_feature = [
torch.tensor([1.0, 2.0, 3.0]),
torch.tensor([4.0, 5.0]),
]
```
Both inputs produce the same flattened byte sequence, but they are not security-equivalent because they represent different feature partitioning and can correspond to different multimodal items, image/video chunking, or processor |
|---|