Diffoscope 156 throws an exception on a valid XML file with entities
Hello diffscope maintainers,
diffscope throws an exception when a valid XML file (with entities) is processed. (This issue might be related to #166)
Sample file: fontdata.xml from the Debian package khmerconverter found in https://packages.debian.org/buster/all/khmerconverter/filelist
$ diffoscope --version
diffoscope 156
I have the recommended package python3-defusedxml
installed.
$ cd /usr/share/khmerconverter
$ diffoscope fontdata.xml fontdata.xml
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/diffoscope/main.py", line 747, in main
sys.exit(run_diffoscope(parsed_args))
File "/usr/lib/python3/dist-packages/diffoscope/main.py", line 701, in run_diffoscope
difference = compare_root_paths(path1, path2)
File "/usr/lib/python3/dist-packages/diffoscope/comparators/utils/compare.py", line 71, in compare_root_paths
file1 = specialize(FilesystemFile(path1, container=container1))
File "/usr/lib/python3/dist-packages/diffoscope/comparators/utils/specialize.py", line 55, in specialize
if try_recognize(file, cls, cls.recognizes):
File "/usr/lib/python3/dist-packages/diffoscope/comparators/utils/specialize.py", line 38, in try_recognize
if not recognizes(file):
File "/usr/lib/python3/dist-packages/diffoscope/comparators/xml.py", line 98, in recognizes
file.parsed = _parse(f)
File "/usr/lib/python3/dist-packages/diffoscope/comparators/xml.py", line 64, in _parse
xml = minidom.parse(file)
File "/usr/lib/python3/dist-packages/defusedxml/minidom.py", line 22, in parse
return _expatbuilder.parse(
File "/usr/lib/python3/dist-packages/defusedxml/expatbuilder.py", line 90, in parse
result = builder.parseFile(file)
File "/usr/lib/python3.8/xml/dom/expatbuilder.py", line 207, in parseFile
parser.Parse(buffer, 0)
File "../Modules/pyexpat.c", line 481, in EntityDecl
File "/usr/lib/python3/dist-packages/defusedxml/expatbuilder.py", line 35, in defused_entity_decl
raise EntitiesForbidden(name, value, base, sysid, pubid, notation_name)
defusedxml.common.EntitiesForbidden: EntitiesForbidden(name='mark', system_id=None, public_id=None)
The reason for the rejection of entities is explained on https://pypi.org/project/defusedxml/
When I remove the file extension, I get no exception:
cp fontdata.xml fontdata
diffoscope fontdata fontdata
Can the xml comparator fall back to the default text comparator in case an exception is thrown?
With kind regards, Roland Clobus