제출 #831508: streamlit 1.53.0 hash collision정보

제목streamlit 1.53.0 hash collision
설명**Title:** Streamlit `@st.cache_data` / `@st.cache_resource` Cache Key Collision in `hashing.py` **Description:** A cache key collision vulnerability exists in Streamlit’s runtime caching implementation in `lib/streamlit/runtime/caching/hashing.py`. The affected logic is used by `@st.cache_data` and `@st.cache_resource` when hashing cached function arguments. The issue is caused by incomplete and predictable hashing of selected input types. For large Pandas, Polars, and NumPy objects, Streamlit hashes only a sampled subset of the input once the object exceeds the internal large-object threshold. The sampling operation uses a fixed deterministic seed, such as `random_state=0`, `seed=0`, or `np.random.RandomState(0)`. Because the sampled indices are globally predictable, an attacker can craft two different objects that are identical at every sampled position but differ at all non-sampled positions, resulting in the same cache key. A second collision condition exists for PIL palette-indexed images using `mode="P"`. In this case, the hashing path uses `Image.tobytes()`, which serializes only the per-pixel palette indices and does not include the image palette itself. Two visually different images with the same pixel indices but different palettes can therefore produce the same cache key. Successful exploitation may cause Streamlit to return stale, incorrect, or attacker-influenced cached results without raising an error. In applications where user-controlled input reaches cached functions, a remote attacker may be able to poison the cache. The impact is especially relevant for `@st.cache_resource`, which is shared across user sessions, potentially causing subsequent users to receive results derived from a different input. **Affected Component:** `lib/streamlit/runtime/caching/hashing.py` **Affected Feature:** `@st.cache_data`, `@st.cache_resource` **Affected Objects:** Pandas `Series` / `DataFrame`, Polars `Series` / `DataFrame`, NumPy `ndarray`, and PIL `Image.Image` in palette mode. **Attack Type:** Remote, if a Streamlit application exposes cached functions that process attacker-controlled large data objects or palette-indexed images. **Impact:** Cache poisoning, incorrect cache hits, stale or wrong data returned to users, and possible cross-session integrity impact. **Suggested CWE:** CWE-345: Insufficient Verification of Data Authenticity Alternative: CWE-682: Incorrect Calculation **Suggested CVSS v3.1:** `AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:H/A:N` — 6.5 Medium For publicly exposed apps where no authentication is required to submit crafted input, this may be adjusted to: `AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:H/A:N` — 7.5 High **Proof of Concept Summary:** For large NumPy arrays, an attacker can compute the fixed sampled indices generated by `np.random.RandomState(0)` and modify only the non-sampled elements. The resulting object differs semantically from the original but produces the same Streamlit cache hash. A similar technique applies to Pandas and Polars objects because the sampling seed is fixed. For PIL P-mode images, two images can share identical pixel index bytes while using different RGB palettes. Since the palette bytes are not included in the hash, both images receive the same cache key despite being visually different. **Mitigation:** The cache key generation should include all security-relevant input data. For large object sampling, the sampling seed should not be globally fixed and predictable; it should be derived from the input or otherwise hardened so sampled indices cannot be precomputed for collision construction. For PIL P-mode images, palette bytes returned by `getpalette()` should be included in the hashed data along with pixel bytes. **References:** * GitHub Issue: `streamlit/streamlit#14622` * Related Pull Request: `streamlit/streamlit#14610` [1]: https://github.com/streamlit/streamlit/issues/14622 "bug: @st.cache_data hash collision via fixed sampling seed and PIL P-mode palette omission · Issue #14622 · streamlit/streamlit · GitHub" [2]: https://github.com/streamlit/streamlit/pull/14610 "fix(caching): derive sampling seed from data to prevent deterministic hash collisions by 3em0 · Pull Request #14610 · streamlit/streamlit · GitHub"
원천⚠️ https://github.com/streamlit/streamlit/issues/14622
사용자
 Dem0 (UID 82596)
제출2026. 05. 16. PM 01:30 (19 날 ago)
모더레이션2026. 06. 04. AM 07:10 (19 days later)
상태수락
VulDB 항목368253 [Streamlit 까지 1.53.0 Palette hashing.py 약한 암호화]
포인트들20

Do you know our Splunk app?

Download it now for free!