Use of --scan option with zipdetails
I'm the upstream author of zipdetails
Got an issue reported recently (see https://github.com/pmqs/zipdetails/issues/24) that uses diffoscope with a zip file.
Looking at https://salsa.debian.org/reproducible-builds/diffoscope/-/blob/master/diffoscope/comparators/zip.py?ref_type=heads I see that zipdetails
is invoked with the --scan
and --redact
options.
class Zipdetails(Command):
@tool_required("zipdetails")
def cmdline(self):
return ["zipdetails", "--redact", "--scan", "--utc", self.path]
Not clear why you want to use --scan
, but I'd rate it as a dangerous option to use in this context.
Here is why -- the --scan
option is intended to be used when you encounter a corrupt/truncated zip file and want to find any traces of the zip metadata. It is very aggressive in scanning the ziip file for zip 4-byte header header signatures -- by design, it will check every 4-byte sequence in the file to see if it matches one of the zip header signatures. If it gets a match, it blindly decodes what it finds, regardless of what is actually there.
That approach is susceptible to false positives, as the reported bug shows.
Also, I'm not clear are you using --redact
option? If the intention is to find the delta between two zip file, doesn't redacting filenames makes that more difficult.