Draft: Add support for nar files
I encounter nar files when working with GNU Guix, but the format comes from the Nix project.
This is the archive format used for Guix binary substitutes, so it's very useful to be able to use diffoscope to compare nars (it's equivalent to comparing .deb's). Currently though, you have to unpack the nar first, then run diffoscope on the directories. For simplicity, I've been looking at whether diffoscope could take care of that.
Merge request reports
Activity
added comparators label
I've got to the point though where I'm going round in circles. In principle, unpacking the file then comparing the directories sounds simple, but I'm not sure there are any other containers that work this way? I'm also not sure if this is the approach to fit this in to diffoscope. I'm not sure how much Python I should try and write for nars, and how much I should try and just delegate to other classes?
Does anyone have any thoughts on the approach?
In principle, unpacking the file then comparing the directories sounds simple, but I'm not sure there are any other containers that work this way?
Hm, something must be broken somewhere, as that is how all
Archive
subclasses are meant to work -- ie. diffoscope should essentially recurse into them automatically. Take a look at thebzip2
comparator (diffoscope/comparators/bzip2.py
), for example: it implements theArchive.extract
method which takes adest_dir
as a parameter, with no need to manage a_contents
and separate classes for the different types of files within the archive. Allextract
has to do is ensure that the contents end up within thisdest_dir
.Would you like another try at this, or should I jump in? I see you've added test files to your
.nar
so I think I have everything I might need, but I'm sure you'd feel more satisfied if you could finish your own MR. (Of course, I might be misunderstand what a.nar
file really is.)I think this
nar
comparator differs from thebzip2
one in that while you can view it as containing a single thing, that thing might be a directory (as well as a file/symlink). I think there's some assumption in the code that anArchiveMember
is a file, and diffoscope crashes if it's a directory, theNarDirectory
class avoids that.Assuming the general approach I've taken is OK, there are maybe two issues.
The first superficial one is that the temporary directories end up in the output, e.g.
./bin/diffoscope tests/data/test3.nar tests/data/test4.nar --- tests/data/test3.nar +++ tests/data/test4.nar │ --- /tmp/diffoscope_ro1vy1s5_data/tmpo2s2k6a5_nar/contents ├── +++ /tmp/diffoscope_ro1vy1s5_data/tmpuqkgjzcq_nar/contents │ │ --- /tmp/diffoscope_ro1vy1s5_data/tmpo2s2k6a5_nar/contents/txt │ ├── +++ /tmp/diffoscope_ro1vy1s5_data/tmpuqkgjzcq_nar/contents/txt │ │ @@ -1,6 +1,12 @@ │ │ +A common form of lorem ipsum reads: │ │ + │ │ Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor │ │ incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis │ │ nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. │ │ Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu │ │ fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in │ │ culpa qui officia deserunt mollit anim id est laborum. │ │ + │ │ +"Lorem ipsum" text is derived from sections 1.10.32--3 of Cicero's De finibus │ │ +bonorum et malorum (On the Ends of Goods and Evils, or alternatively [About] │ │ +The Purposes of Good and Evil).
The other maybe more important one is that the temporary directory management tied in with the Python garbage collection causes diffoscope to crash after the comparison, it seems to try deleting a directory without deleting the contents.
get_temporary_directory
is used elsewhere, so I'm not sure what I'm doing differently, maybe it's something to do with the temporary directory containing directories though...There might also be another problem for weird comparisons as well. When trying to compare two nars, one containing a text file, and another a directory, diffoscope seems to fallback to a binary comparison. This is a bit of an edge case, but diffoscope seems to handle it better when just given a file and directory to compare outside of a nar, so I'm guessing this is some problem in the implementation I've done.
→ ./bin/diffoscope tests/data/test1.nar tests/data/test4.nar --- tests/data/test1.nar +++ tests/data/test4.nar │┄ Format-specific differences are supported for nar files but no file-specific differences were detected; falling back to a binary diff.
Thanks for the background. I'll have a think about the right solution given that the
nar
can be different things - I suspect there is a vaguely elegant solution, but I might need to fiddle first. Can you upload differentnar
files for each case (dir and file/symlink)? Otherwise I will likely miss something.Changing the general approach will probably affect the other things you mention too, so I won't address them here/now.
@cbaines-guest Ooh, I forgot I had worked on this!
I'm no expert on GitLab, but it still seems to exist https://salsa.debian.org/reproducible-builds/diffoscope/-/tree/mr-107-nar-support?ref_type=heads