Skip to content

[Medium] XML parsing via old versions of Pythons xml.minidom module is vulnerable to entity expansion attacks

Issue details

diffoscope uses xml.dom.minidom from Pythons standard library to parse XML DOM content if the safer defusedxml package is not installed. The standard library module is vulnerable to two kinds of DOS attack vectors via entity expansion and/or large tokens if the version of the underlying C library expat is not recent enough.

As described in Pythons module documentation versions of expat <2.4.1 (released on 2021-05-23) are vulnerable to exponential/quadratic entity expansion while versions <2.6.0 (released on 2024-02-06) are vulnerable to large tokens. While expat is usually bundled with Python, the exact version provided by a given Python version is hard to determine as backported security releases can bump the versions of bundled libraries.

Risk

For affected versions, the following example XML file leads to excessive parsing times with high memory usage:

<!DOCTYPE xmlbomb [
<!ENTITY a "1234567890" >
<!ENTITY b "&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;">
<!ENTITY c "&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;">
<!ENTITY d "&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;">
<!ENTITY e "&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;">
<!ENTITY f "&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;">
<!ENTITY g "&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;">
<!ENTITY h "&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;">
<!ENTITY i "&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;&h;">
]>
<bomb>&i;</bomb>

On resource-constrained systems this can lead to execution timeouts or even crashes, thus enabling a DOS vector on diffoscope.

Mitigation

The issue is fixed by using recent versions of expat. Python versions >= 3.13.0 are safe to use while previous versions still use vulnerable versions of expat in their initial vX.0 release. However, as stated previously, minor releases have bumped this version and could be safe.

Nonetheless, diffoscope should detect if a vulnerable version is in use by checking pyexpat.EXPAT_VERSION and either (1) aborting safely, or (2) continuing with a warning message to inform the user of potential risks. In both cases the paths to mitigation by either installing defusedxml (preferred) or upgrading expat should be printed to the user.

EDIT: adjusted the version comparisons to reflect that Expat 2.6.0 is no longer vulnerable.

Edited by Florian Wilkens
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information