提交 #831636: zilliztech GPTCache 0.1.44 Cache poisoning / improper cache key generation信息

标题	zilliztech GPTCache 0.1.44 Cache poisoning / improper cache key generation
描述	## Short Title ```text zilliztech GPTCache file and image cache key collision via BufferedReader.peek() ``` ## Vendor ```text zilliztech ``` ## Product ```text GPTCache ``` ## Affected Component ```text gptcache/processor/pre.py ``` Affected functions: ```text get_file_bytes() get_input_str() get_image_question() ``` Related adapter paths: ```text gptcache/adapter/openai.py gptcache/adapter/replicate.py gptcache/adapter/minigpt4.py ``` ## Affected Versions ```text Affected versions before the fix in PR #678. Exact released version range needs maintainer confirmation. ``` Suggested wording if VulDB requires a concrete version: ```text GPTCache versions containing the vulnerable peek()-based preprocessing logic in gptcache/processor/pre.py before the patch from pull request #678. ``` ## Vulnerability Class ```text Cache poisoning / improper cache key generation ``` Suggested CWE mapping: ```text CWE-20: Improper Input Validation ``` Secondary impact category: ```text CWE-200: Exposure of Sensitive Information to an Unauthorized Actor ``` ## Severity Suggested severity: ```text High ``` Suggested CVSS v3.1 vector: ```text CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:L/A:N ``` Suggested CVSS v3.1 base score: ```text 8.2 ``` If VulDB prefers a more conservative confidentiality-only score: ```text CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N Base score: 7.5 ``` ## Summary ```text A vulnerability in zilliztech GPTCache allows cache key collisions for file and image inputs. The preprocessing helpers in gptcache/processor/pre.py used Python BufferedReader.peek() to derive cache inputs for files and images. peek() only returns bytes currently available in the internal buffer, commonly a prefix of about 8192 bytes, and does not represent the full file content. An attacker can construct two different files or images with the same buffered prefix but different remaining content. GPTCache may then treat the requests as equivalent and return a cached response generated for another file. In shared cache deployments, this can lead to cache poisoning and possible disclosure of cached answers across users. ``` ## Technical Details ```text The vulnerable preprocessing functions used BufferedReader.peek() as part of cache input generation: - get_file_bytes() returned data.get("file").peek() - get_input_str() combined input_data["image"].peek() with the question - get_image_question() used peek()-derived image data when an image path was supplied Because peek() is a buffered read helper, it can return only the internal buffer prefix rather than the complete file. Files with identical initial bytes and different trailing content can therefore produce identical cache keys or pre-embedding inputs. The LLM or downstream adapter may process the full file, but GPTCache's lookup can be based on only the prefix. This mismatch means a request for file B can hit a cache entry created for file A. ``` ## Attack Requirements ```text The attacker must be able to send file or image requests to an application using GPTCache with one of the affected preprocessing functions. The impact is most relevant when the cache is shared across users or sessions, or when an attacker can prime and later reuse the same cache namespace. ``` ## Proof of Concept ```python import io prefix = b"A" * 8192 file_a = io.BufferedReader(io.BytesIO(prefix + b"benign image or audio content")) file_b = io.BufferedReader(io.BytesIO(prefix + b"different attacker-controlled content")) peek_a = file_a.peek() peek_b = file_b.peek() print(peek_a == peek_b) # True on typical BufferedReader behavior print(file_a.read() == file_b.read()) # False: full contents are different ``` Impact in GPTCache: ```text 1. A request using file_a is processed and cached. 2. A request using file_b reaches the same preprocessing function. 3. Since the buffered prefixes match, the cache input can collide. 4. GPTCache may return the cached answer for file_a instead of processing file_b. ``` ## Impact ```text Successful exploitation can cause GPTCache to return responses for the wrong file or image. This can poison cache entries and may expose cached answers generated from another user's input in shared deployments. The issue affects integrity of cached responses and may affect confidentiality when cached answers reveal information about previous requests. ``` ## Fix / Mitigation ```text Replace peek()-based cache inputs with a deterministic digest over the full file content, for example a streaming SHA-256 hash. After hashing, reset the file pointer with seek(0) so downstream model calls can still read the complete file. ``` The referenced patch implements this approach by adding full-content hashing and updating the affected preprocessing helpers. ## References ```text Patch / related pull request: https://github.com/zilliztech/GPTCache/pull/678 Affected file: https://github.com/zilliztech/GPTCache/blob/main/gptcache/processor/pre.py Python BufferedReader.peek() documentation: https://docs.python.org/3/library/io.html#io.BufferedReader.peek ``` ## Disclosure / Coordination Notes ```text The issue appears to be addressed by PR #678. The exact vulnerable release range and whether a CVE has already been assigned should be confirmed with the maintainers before final submission. ```
来源	⚠️ https://github.com/zilliztech/GPTCache/issues/684
用户	Dem0 (UID 82596)
提交	2026-05-16 18時07分 (20 日前)
管理	2026-06-04 07時23分 (19 days later)
状态	已接受
VulDB条目	368260 [zilliztech GPTCache 直到 0.1.44 Cache Key pre.py BufferedReader.peek input_data["image"] 弱加密]
积分	20

◂ 上一步一览下一步 ▸

Do you need the next level of professionalism?

Upgrade your account now!