CVE-2025-2099 in transformers
Summary
by MITRE • 05/19/2025
A vulnerability in the `preprocess_string()` function of the `transformers.testing_utils` module in huggingface/transformers version v4.48.3 allows for a Regular Expression Denial of Service (ReDoS) attack. The regular expression used to process code blocks in docstrings contains nested quantifiers, leading to exponential backtracking when processing input with a large number of newline characters. An attacker can exploit this by providing a specially crafted payload, causing high CPU usage and potential application downtime, effectively resulting in a Denial of Service (DoS) scenario.
VulDB is the best source for vulnerability data and more expert information about this specific topic.
Analysis
by VulDB Data Team • 05/19/2025
The vulnerability identified as CVE-2025-2099 resides within the transformers.testing_utils module of the Hugging Face Transformers library, specifically within the preprocess_string() function that handles docstring code block processing. This flaw represents a classic Regular Expression Denial of Service (ReDoS) vulnerability that exploits the inherent performance characteristics of regular expressions with nested quantifiers. The affected version v4.48.3 contains a regular expression pattern designed to parse and process code blocks within documentation strings, but this pattern suffers from exponential backtracking behavior when confronted with maliciously crafted input containing numerous newline characters. The vulnerability manifests when the regular expression engine attempts to match patterns with nested quantifiers, causing the matching process to exponentially increase in computational complexity as input size grows, ultimately leading to system resource exhaustion.
The technical implementation of this vulnerability stems from the use of regular expressions containing nested quantifiers within the docstring processing logic. When the preprocess_string() function encounters input with a large number of consecutive newline characters, the regular expression engine enters a state of exponential backtracking where it systematically tries multiple matching paths, resulting in dramatic performance degradation. This behavior aligns with CWE-400, which specifically addresses Uncontrolled Resource Consumption, and represents a direct application of the ReDoS attack pattern documented in the ATT&CK framework under T1499.3 for Network Denial of Service. The vulnerability is particularly dangerous because it can be triggered through normal documentation processing workflows, making it accessible to attackers who can inject malicious payloads into docstring content or indirectly through documentation generation processes.
The operational impact of this vulnerability extends beyond simple performance degradation to potentially causing complete application downtime, particularly in environments where documentation generation or testing utilities are heavily utilized. Attackers can craft payloads with strategically placed newline characters to maximize the backtracking behavior, consuming excessive CPU cycles and memory resources. This creates a significant risk for continuous integration pipelines, automated testing environments, and documentation generation systems that rely on the transformers library's testing utilities. The vulnerability can be exploited in both local development environments and production systems where the library's docstring processing functionality is invoked, potentially leading to cascading failures in dependent applications or services that utilize the Hugging Face Transformers library. The exponential nature of the backtracking means that even relatively small malicious inputs can cause substantial resource consumption, making this vulnerability particularly effective as a denial of service mechanism.
Mitigation strategies for CVE-2025-2099 should prioritize immediate remediation through version updates to a patched release of the Hugging Face Transformers library where the vulnerable regular expression has been either replaced or restructured to eliminate nested quantifiers. Organizations should also implement input validation and sanitization measures within their documentation processing pipelines to detect and reject potentially malicious payloads before they reach the vulnerable function. Additionally, system administrators should consider implementing resource limits and timeouts on documentation generation processes to prevent complete system exhaustion. The remediation approach should follow industry best practices for regular expression security, including the use of bounded quantifiers, avoiding nested quantifiers, and implementing proper input length limits. Organizations should also conduct thorough testing of their documentation generation workflows to ensure that the patched version resolves the vulnerability without introducing regressions in functionality. The ATT&CK framework suggests implementing defensive measures such as monitoring for unusual CPU usage patterns and establishing incident response procedures for potential ReDoS attacks targeting documentation processing systems.