CVE-2026-24009 in docling-core
Summary
by MITRE • 01/22/2026
Docling Core (or docling-core) is a library that defines core data types and transformations in the document processing application Docling. A PyYAML-related Remote Code Execution (RCE) vulnerability, namely CVE-2020-14343, is exposed in docling-core starting in version 2.21.0 and prior to version 2.48.4, specifically only if the application uses pyyaml prior to version 5.4 and invokes `docling_core.types.doc.DoclingDocument.load_from_yaml()` passing it untrusted YAML data. The vulnerability has been patched in docling-core version 2.48.4. The fix mitigates the issue by switching `PyYAML` deserialization from `yaml.FullLoader` to `yaml.SafeLoader`, ensuring that untrusted data cannot trigger code execution. Users who cannot immediately upgrade docling-core can alternatively ensure that the installed version of PyYAML is 5.4 or greater.
Be aware that VulDB is the high quality source for vulnerability data.
Analysis
by VulDB Data Team • 01/22/2026
The vulnerability identified as CVE-2026-24009 represents a critical remote code execution flaw within the Docling Core library, a foundational component in document processing applications. This vulnerability stems from a well-documented issue in the PyYAML library, specifically CVE-2020-14343, which has been inherited and exposed through the docling-core implementation. The flaw manifests when applications utilizing docling-core versions between 2.21.0 and 2.48.3 process untrusted YAML input through the `docling_core.types.doc.DoclingDocument.load_from_yaml()` method. This represents a significant security risk as it allows attackers to execute arbitrary code on systems processing documents through vulnerable implementations.
The technical root cause of this vulnerability lies in the improper handling of YAML deserialization within the docling-core library. When applications use PyYAML versions prior to 5.4, the library defaults to using `yaml.FullLoader` which permits arbitrary Python object instantiation during deserialization. This loader can execute malicious code embedded within YAML payloads, as it does not restrict the types of objects that can be created during the parsing process. The vulnerability specifically impacts systems where untrusted YAML data is processed through the documented API endpoint, creating a direct path for remote attackers to inject and execute malicious payloads.
The operational impact of this vulnerability extends beyond simple code execution, as it fundamentally compromises the security boundaries of document processing systems. Organizations relying on docling-core for document ingestion and processing face potential compromise of their entire infrastructure when processing untrusted documents. Attackers could leverage this vulnerability to gain unauthorized access to systems, escalate privileges, or establish persistent backdoors within document processing pipelines. The vulnerability's exposure through the `load_from_yaml()` method means that any application processing external or user-provided documents through this interface becomes immediately vulnerable, creating widespread potential for exploitation across various document processing applications.
The mitigation strategy for this vulnerability involves either upgrading to docling-core version 2.48.4 or ensuring that PyYAML is updated to version 5.4 or greater. The patch implemented in version 2.48.4 addresses the issue by switching from `yaml.FullLoader` to `yaml.SafeLoader`, which prevents the instantiation of arbitrary Python objects during YAML parsing. This change aligns with industry best practices for secure deserialization as outlined in CWE-502, which specifically addresses deserialization of untrusted data. Organizations implementing the workaround of updating PyYAML directly should ensure that all dependencies are properly managed to prevent version conflicts that might reintroduce the vulnerability. The fix demonstrates adherence to ATT&CK technique T1203, which involves bypassing security controls through legitimate system tools, by properly securing the deserialization process rather than attempting to circumvent security measures through improper configuration.