CVE-2026-0847 in nltkinfo

Summary

by MITRE • 03/04/2026

A vulnerability in NLTK versions up to and including 3.9.2 allows arbitrary file read via path traversal in multiple CorpusReader classes, including WordListCorpusReader, TaggedCorpusReader, and BracketParseCorpusReader. These classes fail to properly sanitize or validate file paths, enabling attackers to traverse directories and access sensitive files on the server. This issue is particularly critical in scenarios where user-controlled file inputs are processed, such as in machine learning APIs, chatbots, or NLP pipelines. Exploitation of this vulnerability can lead to unauthorized access to sensitive files, including system files, SSH private keys, and API tokens, and may potentially escalate to remote code execution when combined with other vulnerabilities.

If you want to get best quality of vulnerability data, you may have to visit VulDB.

Analysis

by VulDB Data Team • 03/07/2026

The vulnerability identified as CVE-2026-0847 represents a critical path traversal flaw within the Natural Language Toolkit (NLTK) library affecting versions through 3.9.2. This security weakness resides in multiple CorpusReader classes including WordListCorpusReader, TaggedCorpusReader, and BracketParseCorpusReader which are commonly used for processing linguistic data in natural language processing applications. The core issue stems from insufficient input validation and sanitization of file paths during corpus processing operations, creating an avenue for malicious actors to manipulate file access patterns and traverse directory structures beyond intended boundaries.

The technical implementation of this vulnerability allows attackers to exploit the lack of proper path validation by crafting malicious input that includes directory traversal sequences such as "../" or "..\\" in file paths. When these unvalidated paths are processed by the affected CorpusReader classes, the system fails to properly sanitize the input, enabling arbitrary file access to sensitive system resources. This flaw particularly impacts applications that process user-supplied data through NLTK's corpus reading functionality, where the library is invoked with potentially malicious file paths that bypass normal access controls. The vulnerability aligns with CWE-22 Path Traversal and CWE-77 Path Traversal in multiple layers, demonstrating how improper input validation can lead to unauthorized system access.

Operational impact of this vulnerability extends beyond simple data exposure to potentially enable more severe security consequences including complete system compromise. In machine learning APIs, chatbots, and NLP pipelines where user input is processed through NLTK, attackers can leverage this vulnerability to access critical system files including configuration files, authentication tokens, SSH private keys, and database credentials. The risk is particularly elevated in cloud environments or containerized applications where NLTK is used for processing external inputs, as successful exploitation can provide attackers with access to sensitive infrastructure components. This vulnerability can also serve as a stepping stone for further attacks, potentially enabling privilege escalation or lateral movement within compromised environments, which aligns with ATT&CK technique T1083 File and Directory Discovery and T1566 Phishing for Information.

Mitigation strategies for CVE-2026-0847 should prioritize immediate version updates to NLTK 3.9.3 or later where the path traversal vulnerabilities have been addressed through proper input sanitization and validation mechanisms. Organizations should implement additional defensive measures including input validation at application level, restricting file system access permissions for NLTK processing components, and employing sandboxing techniques to isolate corpus processing operations. Network segmentation and monitoring should be implemented to detect anomalous file access patterns that might indicate exploitation attempts. Security teams should also conduct comprehensive code reviews to identify other potential instances of similar path traversal vulnerabilities within applications that utilize NLTK or other similar libraries, ensuring that proper input validation is implemented across all file processing operations. The remediation process must include thorough testing to verify that the patched version functions correctly while maintaining the intended corpus processing capabilities.

Responsible

@huntr Ai

Reservation

01/11/2026

Disclosure

03/04/2026

Moderation

accepted

CPE

ready

EPSS

0.00080

KEV

no

Activities

very low

Sources

Do you need the next level of professionalism?

Upgrade your account now!