Please provide a smarter hex dump differ
This bug was originally reported by Daniel Shahaf (danielsh@apache.org) in Debian bug #841907:
Currently, insertion or deletion of a single byte causes the remainder
of a hex dump to be shown as "all lines are different", since the
stream of byte values is the same but the lines of the hex dump (16Â byte
values per line) are not, they only have 15 out of 16 byte values equal.
Example: see attached monkeystudio diff. The difference is on the first
line (0xA9 v. 0x5C 0x32 0x35 0x31), but _every single line_ in the
section shows a three-byte diff; the last three bytes of left line N are
equal to the first three bytes of right line N+1, but the diff overlooks
that. Consequently, the signal/noise ratio of the diff is low.
I think the following patch should improve the situation: it causes the
output to omit line numbers, and include a newline after each byte value,
so any insertion/deletion of a single byte would result in a diff that
inserts/deletes a single line, without ripple effects. The line numbers
in the diff would correspond to byte offsets in the hex dumped file.
I originally ran into that issue in .rodata diffs, which don't use the
xxd codepath, but the cases are analogous. If this idea works out, we
should teach the same trick to the ELF comparator's «readelf --hexdump»
output. (This would also fix #838569, about ignoring addresses in
.rodata.)
I haven't tested this idea yet; I'm only filing this issue so I don't
forget it.
Cheers,
Daniel
[[[
diff --git a/diffoscope/comparators/utils.py b/diffoscope/comparators/utils.py
index 1529dae..4c8603d 100644
--- a/diffoscope/comparators/utils.py
+++ b/diffoscope/comparators/utils.py
@@ -350,4 +350,4 @@ class NonExistingArchive(Archive):
class Xxd(Command):
@tool_required('xxd')
def cmdline(self):
- return ['xxd', self.path]
+ return ['xxd', '-p', '-c1', self.path]
]]]