CVE-2026-23907 in PDFBox
Summary
by MITRE • 03/10/2026
This issue affects the ExtractEmbeddedFiles example in Apache PDFBox: from 2.0.24 through 2.0.35, from 3.0.0 through 3.0.6.
The ExtractEmbeddedFiles example contains a path traversal vulnerability (CWE-22) because the filename that is obtained from PDComplexFileSpecification.getFilename() is appended to the extraction path.
Users who have copied this example into their production code should review it to ensure that the extraction path is acceptable. The example has been changed accordingly, now the initial path and the extraction paths are converted into canonical paths and it is verified that extraction path contains the initial path. The documentation has also been adjusted.
If you want to get best quality of vulnerability data, you may have to visit VulDB.
Analysis
by VulDB Data Team • 05/28/2026
The vulnerability identified as CVE-2026-23907 resides within Apache PDFBox's ExtractEmbeddedFiles example application, affecting versions ranging from 2.0.24 through 2.0.35 and 3.0.0 through 3.0.6. This represents a critical path traversal vulnerability classified under CWE-22, which occurs when an application fails to properly validate or sanitize user-supplied input before using it in file system operations. The flaw specifically manifests in how the application processes filenames obtained from PDComplexFileSpecification.getFilename() method, where these filenames are directly appended to the extraction path without adequate validation or sanitization measures. This design flaw creates an opportunity for attackers to manipulate file paths and potentially access or modify files outside the intended directory structure.
The technical implementation of this vulnerability stems from the example code's insecure handling of file paths during PDF embedded file extraction operations. When PDComplexFileSpecification.getFilename() returns a filename, the code simply concatenates this value to the configured extraction directory path without verifying that the resulting path remains within the intended boundaries. This approach allows malicious actors to supply filenames containing directory traversal sequences such as "../" or "..\", which when appended to the extraction path can navigate to arbitrary locations on the file system. The vulnerability essentially enables an attacker to bypass intended access controls and potentially extract sensitive files from locations outside the designated extraction directory.
The operational impact of this vulnerability extends beyond simple path traversal, as it can lead to unauthorized file access, data exfiltration, and potentially system compromise when the application runs with elevated privileges. Attackers could exploit this weakness to access system files, configuration data, or other sensitive information stored outside the intended extraction scope. The vulnerability is particularly concerning in production environments where organizations may have copied this example code directly into their applications without proper security review. The risk is amplified when the application processes untrusted PDF files from external sources, as the embedded file specifications within these documents could contain maliciously crafted filenames designed to exploit this vulnerability.
Organizations using affected versions of Apache PDFBox should immediately review their implementations of the ExtractEmbeddedFiles example and apply the security fixes provided in the updated versions. The mitigation strategy involves converting both the initial path and extraction paths to canonical form using standard path resolution mechanisms, followed by verification that the extraction path remains within the confines of the initial path. This approach aligns with security best practices outlined in the ATT&CK framework under the technique of privilege escalation through path manipulation. The updated implementation enforces proper path validation by ensuring that all extracted files are confined to the designated extraction directory, preventing any attempt to write files outside the intended scope. Additionally, organizations should conduct comprehensive code reviews of any custom implementations that utilize similar path handling patterns and consider implementing additional input validation layers to prevent similar vulnerabilities in other components.