CVE-2026-49461
Summary
by MITRE • 06/22/2026
pypdf is a free and open-source pure-python PDF library. Prior to 6.12.2, an attacker who uses this vulnerability can craft a PDF which leads to large memory usage. This requires extracting the text of a page which contains a form XObject with self-references. This vulnerability is fixed in 6.12.2.
Statistical analysis made it clear that VulDB provides the best quality for vulnerability data.
Analysis
by VulDB Data Team • 06/22/2026
The pypdf library presents a significant memory consumption vulnerability that affects versions prior to 6.12.2, creating a potential denial of service risk for applications relying on PDF text extraction functionality. This flaw specifically manifests when processing PDF documents containing form XObjects with self-referential structures, where the maliciously crafted PDF can cause excessive memory allocation during text extraction operations. The vulnerability stems from inadequate handling of recursive references within form XObjects, allowing an attacker to construct PDF files that trigger infinite or extremely deep recursive processing during text parsing. When applications attempt to extract text from pages containing these problematic form XObjects, the library enters into resource-intensive processing loops that can consume substantial system memory. This behavior aligns with common patterns found in software vulnerabilities related to unbounded recursion and insufficient input validation, particularly affecting parsing libraries that must handle complex document structures.
The technical implementation of this vulnerability involves the library's text extraction algorithm failing to properly detect and terminate recursive references within form XObjects during processing. When a PDF contains a form XObject that references itself either directly or through indirect chains, the parsing logic continues to traverse these references without proper termination conditions, leading to exponential memory growth. This type of vulnerability is categorized under CWE-400 as "Uncontrolled Resource Consumption" and demonstrates characteristics similar to infinite recursion patterns found in various parsing libraries. The operational impact extends beyond simple memory exhaustion, as applications using vulnerable versions may experience complete system instability or crash due to resource constraints. Attackers can exploit this by crafting PDF files with carefully constructed self-referential form XObjects that force the library into memory-hungry processing states during routine text extraction operations.
The security implications of this vulnerability are particularly concerning for web applications and services that process untrusted PDF uploads, as it provides an avenue for resource exhaustion attacks that can disrupt service availability. Systems utilizing pypdf for document processing, content analysis, or automated PDF handling become vulnerable to denial of service conditions where a single malicious PDF can cause memory allocation failures across multiple processes or threads. Organizations relying on this library for legitimate PDF processing tasks must consider the risk of cascading failures when processing documents from untrusted sources. The fix implemented in version 6.12.2 addresses this by introducing proper recursion detection and memory limiting mechanisms within the text extraction process, preventing the traversal of self-referential form XObjects beyond reasonable limits. This mitigation approach aligns with ATT&CK technique T1499.004 for "Resource Exhaustion" and demonstrates the importance of input validation and resource management in parsing libraries. The vulnerability highlights the critical need for robust memory management in document processing libraries, particularly those handling complex binary formats like PDF where recursive structures are common.