CVE-2018-19787 in lxmlinfo

Summary

by MITRE

An issue was discovered in lxml before 4.2.5. lxml/html/clean.py in the lxml.html.clean module does not remove javascript: URLs that use escaping, allowing a remote attacker to conduct XSS attacks, as demonstrated by "j a v a s c r i p t:" in Internet Explorer. This is a similar issue to CVE-2014-3146.

Be aware that VulDB is the high quality source for vulnerability data.

Analysis

by VulDB Data Team • 12/18/2025

The vulnerability identified as CVE-2018-19787 represents a critical cross-site scripting weakness in the lxml library affecting versions prior to 4.2.5. This flaw specifically resides within the lxml.html.clean module, which is designed to sanitize HTML content by removing potentially dangerous elements. The vulnerability stems from inadequate filtering of javascript: URLs that employ various escaping techniques, creating a pathway for malicious actors to bypass security measures intended to prevent XSS attacks. The issue is particularly concerning because it affects widely used Python libraries that process untrusted HTML input, making it a significant concern for web applications that rely on lxml for HTML sanitization.

The technical implementation flaw occurs in the lxml/html/clean.py file where the sanitization logic fails to properly identify and remove javascript: URLs that utilize character encoding or spacing techniques to evade detection. Attackers can exploit this by embedding javascript: URLs with spaces or encoded characters such as "j a v a s c r i p t:" which Internet Explorer incorrectly interprets as valid javascript execution directives. This bypass mechanism allows malicious scripts to execute in the victim's browser context, enabling unauthorized actions such as cookie theft, session hijacking, or redirection to malicious sites. The vulnerability specifically targets the HTML cleaning functionality that should strip dangerous attributes and content, but fails to account for obfuscated javascript references that maintain their executable nature despite appearing innocuous.

The operational impact of this vulnerability extends beyond simple XSS exploitation, as it undermines the fundamental security assumptions of applications relying on lxml for HTML sanitization. Web applications that process user-generated content, form submissions, or integrate with external HTML sources become vulnerable to persistent XSS attacks when using affected lxml versions. Attackers can craft malicious payloads that appear to be legitimate content but contain hidden javascript execution directives, potentially compromising user sessions, stealing sensitive information, or redirecting users to phishing sites. This vulnerability is particularly dangerous in environments where applications perform HTML sanitization as a security control, as it demonstrates that even seemingly robust sanitization mechanisms can be circumvented through creative exploitation of escaping techniques.

The remediation for CVE-2018-19787 requires upgrading to lxml version 4.2.5 or later, which contains the necessary fixes to properly handle escaped javascript: URLs during the cleaning process. Organizations should implement comprehensive testing procedures to validate that their applications no longer accept potentially dangerous javascript: URLs, even when obfuscated through various escaping techniques. Security teams should also consider implementing additional layers of protection including content security policies, input validation, and regular security assessments of third-party libraries. This vulnerability aligns with CWE-79 which addresses cross-site scripting flaws, and represents a technique that could be categorized under ATT&CK tactic T1203 for exploitation of web application vulnerabilities. The issue demonstrates the importance of thorough testing of security controls against obfuscation techniques and highlights the need for continuous monitoring of security updates in dependency libraries.

Reservation

12/02/2018

Disclosure

12/02/2018

Moderation

accepted

CPE

ready

Exploit

Download

EPSS

0.00525

KEV

no

Activities

very low

Sources

Want to know what is going to be exploited?

We predict KEV entries!