| Título | zilliztech GPTCache 0.1.44 Cache poisoning / improper cache key generation |
|---|
| Descripción | ## Short Title
```text
zilliztech GPTCache file and image cache key collision via BufferedReader.peek()
```
## Vendor
```text
zilliztech
```
## Product
```text
GPTCache
```
## Affected Component
```text
gptcache/processor/pre.py
```
Affected functions:
```text
get_file_bytes()
get_input_str()
get_image_question()
```
Related adapter paths:
```text
gptcache/adapter/openai.py
gptcache/adapter/replicate.py
gptcache/adapter/minigpt4.py
```
## Affected Versions
```text
Affected versions before the fix in PR #678.
Exact released version range needs maintainer confirmation.
```
Suggested wording if VulDB requires a concrete version:
```text
GPTCache versions containing the vulnerable peek()-based preprocessing logic in gptcache/processor/pre.py before the patch from pull request #678.
```
## Vulnerability Class
```text
Cache poisoning / improper cache key generation
```
Suggested CWE mapping:
```text
CWE-20: Improper Input Validation
```
Secondary impact category:
```text
CWE-200: Exposure of Sensitive Information to an Unauthorized Actor
```
## Severity
Suggested severity:
```text
High
```
Suggested CVSS v3.1 vector:
```text
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:L/A:N
```
Suggested CVSS v3.1 base score:
```text
8.2
```
If VulDB prefers a more conservative confidentiality-only score:
```text
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N
Base score: 7.5
```
## Summary
```text
A vulnerability in zilliztech GPTCache allows cache key collisions for file and image inputs. The preprocessing helpers in gptcache/processor/pre.py used Python BufferedReader.peek() to derive cache inputs for files and images. peek() only returns bytes currently available in the internal buffer, commonly a prefix of about 8192 bytes, and does not represent the full file content.
An attacker can construct two different files or images with the same buffered prefix but different remaining content. GPTCache may then treat the requests as equivalent and return a cached response generated for another file. In shared cache deployments, this can lead to cache poisoning and possible disclosure of cached answers across users.
```
## Technical Details
```text
The vulnerable preprocessing functions used BufferedReader.peek() as part of cache input generation:
- get_file_bytes() returned data.get("file").peek()
- get_input_str() combined input_data["image"].peek() with the question
- get_image_question() used peek()-derived image data when an image path was supplied
Because peek() is a buffered read helper, it can return only the internal buffer prefix rather than the complete file. Files with identical initial bytes and different trailing content can therefore produce identical cache keys or pre-embedding inputs.
The LLM or downstream adapter may process the full file, but GPTCache's lookup can be based on only the prefix. This mismatch means a request for file B can hit a cache entry created for file A.
```
## Attack Requirements
```text
The attacker must be able to send file or image requests to an application using GPTCache with one of the affected preprocessing functions. The impact is most relevant when the cache is shared across users or sessions, or when an attacker can prime and later reuse the same cache namespace.
```
## Proof of Concept
```python
import io
prefix = b"A" * 8192
file_a = io.BufferedReader(io.BytesIO(prefix + b"benign image or audio content"))
file_b = io.BufferedReader(io.BytesIO(prefix + b"different attacker-controlled content"))
peek_a = file_a.peek()
peek_b = file_b.peek()
print(peek_a == peek_b) # True on typical BufferedReader behavior
print(file_a.read() == file_b.read()) # False: full contents are different
```
Impact in GPTCache:
```text
1. A request using file_a is processed and cached.
2. A request using file_b reaches the same preprocessing function.
3. Since the buffered prefixes match, the cache input can collide.
4. GPTCache may return the cached answer for file_a instead of processing file_b.
```
## Impact
```text
Successful exploitation can cause GPTCache to return responses for the wrong file or image. This can poison cache entries and may expose cached answers generated from another user's input in shared deployments. The issue affects integrity of cached responses and may affect confidentiality when cached answers reveal information about previous requests.
```
## Fix / Mitigation
```text
Replace peek()-based cache inputs with a deterministic digest over the full file content, for example a streaming SHA-256 hash. After hashing, reset the file pointer with seek(0) so downstream model calls can still read the complete file.
```
The referenced patch implements this approach by adding full-content hashing and updating the affected preprocessing helpers.
## References
```text
Patch / related pull request:
https://github.com/zilliztech/GPTCache/pull/678
Affected file:
https://github.com/zilliztech/GPTCache/blob/main/gptcache/processor/pre.py
Python BufferedReader.peek() documentation:
https://docs.python.org/3/library/io.html#io.BufferedReader.peek
```
## Disclosure / Coordination Notes
```text
The issue appears to be addressed by PR #678. The exact vulnerable release range and whether a CVE has already been assigned should be confirmed with the maintainers before final submission.
``` |
|---|
| Fuente | ⚠️ https://github.com/zilliztech/GPTCache/issues/684 |
|---|
| Usuario | Dem0 (UID 82596) |
|---|
| Sumisión | 2026-05-16 18:07 (hace 20 días) |
|---|
| Moderación | 2026-06-04 07:23 (19 days later) |
|---|
| Estado | Aceptado |
|---|
| Entrada de VulDB | 368260 [zilliztech GPTCache hasta 0.1.44 Cache Key pre.py BufferedReader.peek input_data["image"] cifrado débil] |
|---|
| Puntos | 20 |
|---|