Skip to content

Diffoscope 156 throws an exception on a valid XML file with entities

Hello diffscope maintainers,

diffscope throws an exception when a valid XML file (with entities) is processed. (This issue might be related to #166)

Sample file: fontdata.xml from the Debian package khmerconverter found in https://packages.debian.org/buster/all/khmerconverter/filelist

$ diffoscope --version

diffoscope 156

I have the recommended package python3-defusedxml installed.

$ cd /usr/share/khmerconverter

$ diffoscope fontdata.xml fontdata.xml

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/diffoscope/main.py", line 747, in main
    sys.exit(run_diffoscope(parsed_args))
  File "/usr/lib/python3/dist-packages/diffoscope/main.py", line 701, in run_diffoscope
    difference = compare_root_paths(path1, path2)
  File "/usr/lib/python3/dist-packages/diffoscope/comparators/utils/compare.py", line 71, in compare_root_paths
    file1 = specialize(FilesystemFile(path1, container=container1))
  File "/usr/lib/python3/dist-packages/diffoscope/comparators/utils/specialize.py", line 55, in specialize
    if try_recognize(file, cls, cls.recognizes):
  File "/usr/lib/python3/dist-packages/diffoscope/comparators/utils/specialize.py", line 38, in try_recognize
    if not recognizes(file):
  File "/usr/lib/python3/dist-packages/diffoscope/comparators/xml.py", line 98, in recognizes
    file.parsed = _parse(f)
  File "/usr/lib/python3/dist-packages/diffoscope/comparators/xml.py", line 64, in _parse
    xml = minidom.parse(file)
  File "/usr/lib/python3/dist-packages/defusedxml/minidom.py", line 22, in parse
    return _expatbuilder.parse(
  File "/usr/lib/python3/dist-packages/defusedxml/expatbuilder.py", line 90, in parse
    result = builder.parseFile(file)
  File "/usr/lib/python3.8/xml/dom/expatbuilder.py", line 207, in parseFile
    parser.Parse(buffer, 0)
  File "../Modules/pyexpat.c", line 481, in EntityDecl
  File "/usr/lib/python3/dist-packages/defusedxml/expatbuilder.py", line 35, in defused_entity_decl
    raise EntitiesForbidden(name, value, base, sysid, pubid, notation_name)
defusedxml.common.EntitiesForbidden: EntitiesForbidden(name='mark', system_id=None, public_id=None)

The reason for the rejection of entities is explained on https://pypi.org/project/defusedxml/

When I remove the file extension, I get no exception:

cp fontdata.xml fontdata

diffoscope fontdata fontdata

Can the xml comparator fall back to the default text comparator in case an exception is thrown?

With kind regards, Roland Clobus

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information