Skip to content

Use of --scan option with zipdetails

I'm the upstream author of zipdetails

Got an issue reported recently (see https://github.com/pmqs/zipdetails/issues/24) that uses diffoscope with a zip file.

Looking at https://salsa.debian.org/reproducible-builds/diffoscope/-/blob/master/diffoscope/comparators/zip.py?ref_type=heads I see that zipdetails is invoked with the --scan and --redact options.

class Zipdetails(Command):
    @tool_required("zipdetails")
    def cmdline(self):
        return ["zipdetails", "--redact", "--scan", "--utc", self.path]

Not clear why you want to use --scan, but I'd rate it as a dangerous option to use in this context.

Here is why -- the --scan option is intended to be used when you encounter a corrupt/truncated zip file and want to find any traces of the zip metadata. It is very aggressive in scanning the ziip file for zip 4-byte header header signatures -- by design, it will check every 4-byte sequence in the file to see if it matches one of the zip header signatures. If it gets a match, it blindly decodes what it finds, regardless of what is actually there.

That approach is susceptible to false positives, as the reported bug shows.

Also, I'm not clear are you using --redact option? If the intention is to find the delta between two zip file, doesn't redacting filenames makes that more difficult.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information