Rclobus/reproducible images
Keep the original timestamps of the individual files as much as possible. Always set SOURCE_DATE_EPOCH at the beginning and use that value where needed.
Merge request reports
Activity
added 16 commits
-
6e4e10f0 - 1 commit from branch
live-team:master
- 9059fbb7 - Keep timestamps while copying
- f37495a3 - Keep timestamps while copying
- c32ee4e5 - Keep timestamps while copying
- 743bbacd - Keep timestamps while copying
- 293bb05d - Keep timestamps while copying
- abcb27d8 - Ensure that SOURCE_DATE_EPOCH is always set
- 5a258292 - Pass along SOURCE_DATE_EPOCH and keep timestamps while copying
- 5b0aa37a - Pass along SOURCE_DATE_EPOCH and create binary/.disk outside the chroot
- f1142c67 - Use SOURCE_DATE_EPOCH for the splash screen
- 74e8551b - Use SOURCE_DATE_EPOCH for the partition and keep timestamps while copying
- 921d074a - Use SOURCE_DATE_EPOCH for the partition and keep timestamps while copying
- e6388aef - Keep timestamps while copying and remove the file filesystem.XXX-remove when empty
- d8272c0e - Apply SOURCE_DATE_EPOCH to newly generated files
- a83e025a - Apply SOURCE_DATE_EPOCH to .disk/mkisofs
- 2170ef0a - Set timestamps in embedded files
Toggle commit list-
6e4e10f0 - 1 commit from branch
68 68 lb chroot_prep install "${CHROOT_PREP_MOUNTS}" "${@}" 69 69 fi 70 70 71 # Apply SOURCE_DATE_EPOCH to all modified and created files 72 find binary -newermt "$(date '+%Y-%m-%d %H:%M:%S' -d @${SOURCE_DATE_EPOCH})" -print -exec touch '{}' -d@${SOURCE_DATE_EPOCH} --no-dereference ';' > binary.modified_timestamps What is setting
SOURCE_DATE_EPOCH
in this context? If it's the live-build package, then it will be live-build's latest changelog entry. But this means once a build is re-run with newer packages, they will be artificially changed to match the timestamp of live-build instead of their own package. Unless I'm missing something, this doesn't look right to me.changed this line in version 6 of the diff
The find command sets the timestamp for all files, directories and symlinks in the
binary
directory. The content of thisbinary
directory becomes the root of the the generated iso-image. The environment variableSOURCE_DATE_EPOCH
can be either set outside of the invocation oflb build
, as I am using, or it will be set infunctions/configuration.sh
abcb27d8.I've now run a new batch of tests without setting SOURCE_DATE_EPOCH before invoking
lb build
. The results are not what they should be. I'll update this merge request soon.The timestamps of the files in the chroot, which will land in live/filesystem.squashfs, are clamped by
mksquashfs
when SOURCE_DATE_EPOCH is set. The find command that you pointed out will set the timestamps for all files that are newer than SOURCE_DATE_EPOCH in the binary directory. All older files will have their timestamp preserved.
added 21 commits
-
2170ef0a...3f7dd00f - 6 commits from branch
live-team:master
- 5f188771 - Keep timestamps while copying
- cfcb5657 - Keep timestamps while copying
- bb10ed35 - Keep timestamps while copying
- 3b753026 - Keep timestamps while copying
- ed2b2608 - Keep timestamps while copying
- 15673b91 - Ensure that SOURCE_DATE_EPOCH is always set in all sub scripts.
- a37de1af - Pass along SOURCE_DATE_EPOCH and keep timestamps while copying
- 42674abc - Pass along SOURCE_DATE_EPOCH
- 327bb4cc - Use SOURCE_DATE_EPOCH for the splash screen
- e048b073 - Use SOURCE_DATE_EPOCH for the partition and keep timestamps while copying
- eb3ff072 - Use SOURCE_DATE_EPOCH for the partition and keep timestamps while copying
- 442d5063 - Keep timestamps while copying and remove the file filesystem.XXX-remove when empty
- acc28b28 - Apply SOURCE_DATE_EPOCH to newly generated files
- d7b5e037 - Apply SOURCE_DATE_EPOCH to .disk/mkisofs
- 9a742914 - Set timestamps in embedded files
Toggle commit list-
2170ef0a...3f7dd00f - 6 commits from branch
added 9 commits
- eb57b2f1 - SOURCE_DATE_EPOCH is always set
- 8baf6418 - Use SOURCE_DATE_EPOCH for the splash screen
- a0f7af1d - Use SOURCE_DATE_EPOCH for the partition and keep timestamps while copying
- 83fea8cc - Use SOURCE_DATE_EPOCH for the partition and keep timestamps while copying
- e587110d - Keep timestamps while copying and remove the file filesystem.XXX-remove when empty
- adb7717e - Apply SOURCE_DATE_EPOCH to newly generated files
- 1fb3551f - Apply SOURCE_DATE_EPOCH to .disk/mkisofs
- 5aa31ae2 - Set timestamps in embedded files
- cdda09bd - Resolve the timestamp of LB_ISO_VOLUME during building the image (lb build)...
Toggle commit listI've updated the code now.
I've tested the following main scenarios: A) Calling 'lb config' without SOURCE_DATE_EPOCH set, followed by 'lb build', will result in an image that is nearly the same as before this patch. The only difference is that the timestamps in the image (not only in the squashfs part) will all be identical to the moment 'lb build' was invoked (in local time). B) Set SOURCE_DATE_EPOCH, followed by 'lb config' and 'lb build': The usual scenario, results in reproducible images (with UTC timestamps)
Other (less typical) scenarios: C) Calling 'lb config' without SOURCE_DATE_EPOCH set, setting SOURCE_DATE_EPOCH, followed by 'lb build': Not a typical invocation sequence. The timestamps are set to SOURCE_DATE_EPOCH, in local time. D) Set SOURCE_DATE_EPOCH, calling 'lb config'. The configuration will use UTC, and can be used as a basis for later reproducible builds (then setting SOURCE_DATE_EPOCH and calling 'lb build')
The only difference is that the timestamps in the image (not only in the squashfs part) will all be identical to the moment 'lb build' was invoked (in local time).
As mentioned before, changing timestamps of files that come from packages is wrong. The hashsums will no longer match. This does not help with reproducibility in any way: if a package doesn't change, its timestamps do not change either.
Fixing the timestamps of files created during the build process is of course fine.
Ah, I think I understand your issue now. The timestamps (both in the ISO and squashfs part) are preserved as much as possible, due to the many
cp -a
commands that I've added. Thefind -newermt
command will only touch files/folders/symlinks that have a timestamp that is newer than SOURCE_DATE_EPOCH and therefore could not have originated from the packages.I've used this configuration to test the difference in behaviour of upstream/master and rclobus/reproducible_images:
lb config --apt-http-proxy http://localhost:3142 --parent-mirror-bootstrap http://localhost:3142/snapshot.debian.org/archive/debian/20200926T104248Z --parent-mirror-binary http://deb.debian.org/debian/ --security false --updates false --apt-options "--yes -o Acquire::Check-Valid-Until=false" --distribution buster --debian-installer none --loadlin false --cache-packages false
When I compare the result with
diffoscope
, I can see that in e.g./boot/grub/i386-efi/
the.mod
files now have their original (as packaged) timestamps preserved, which was not the case in the current upstream/master.The
find -newermt
command will only touch files/folders/symlinks that have a timestamp that is newer than SOURCE_DATE_EPOCH and therefore could not have originated from the packages.This depends entirely on what the SOURCE_DATE_EPOCH is set to. That is a problem - it must never change files from packages, regardless of conditions and configs.
But isn't the whole idea behind SOURCE_DATE_EPOCH, that it should allow you to reproduce an earlier state?
A regular user will not set SOURCE_DATE_EPOCH and will get a live image that contains the timestamp of the moment
lb build
was invoked, and all original timestamps are preserved (which I think is correct).When SOURCE_DATE_EPOCH is set before calling
lb build
, one needs to be sure that the packages that are used for building match that specific date. That's why I've usedsnapshots
as the source for my Debian packages. I'm able to verify with 100% certainty that the image comes from that specific point in time and has not been altered.If the user would set SOURCE_DATE_EPOCH to a value too far in the past, the resulting image can not be traced to the moment it was generated and therefore will not be reproducible at all. A checksum of that ISO image would then quickly show that something went wrong.
Alternatively, if the
find -newermt
approach should be abandoned again, the scripts will be littered with individualtouch
commands, similar to my original approach, which can be seen in !209 (closed). Each newly created or modified file would then need its owntouch
command to ensure a consistent, identical timestamp throughout the image.But isn't the whole idea behind SOURCE_DATE_EPOCH, that it should allow you to reproduce an earlier state?
An earlier state is different from a state that never was. Packages are fixed and reproducible themselves - if you change their content, they are no longer reproducible, and most importantly the hashsums do not match anymore. Changing files created by live-build/config/etc is fine, changing files that come from repositories is not.
So we are now discussing 4 different scenarios:
- No SOURCE_DATE_EPOCH is set by the user
- SOURCE_DATE_EPOCH is set by the user, which matches the SOURCE_DATE_EPOCH which was used when creating the official live images
- SOURCE_DATE_EPOCH is set by the user to a date before the most recent file from a package that is included in the live image
- SOURCE_DATE_EPOCH is set by the user to a date later than the SOURCE_DATE_EPOCH which was used when creating the official live images
Are scenarios 1 and 2 OK and do not need further discussion?
In scenario 3 some timestamps of some files of packages will be adjusted (which, I agree, should not happen)
In scenario 4 the timestamps of all newly modified files will differ from the original image.
I personally think that scenarios 3 and 4 result in images that show that reproducibility is not achieved, and therefore are confirmation that scenario 2 is working properly.
I think it would be nice if the official Debian live images would be reproducible. That means that the official builder daemon would use
snapshots
and set EPOCH_SOURCE_DATE accordingly. Any other user, who wants to verify that published binary ISO image, would then be able to re-generate that file with 100% identical content.The problem is not what the variable is set to, the problem is doing arbitrary and unexpected changes to files that should be immutable and that are checksummed. This should not happen under any circumstances, regardless of how one feels about reasonableness of provided configuration. If it means more work, then so be it - for example, we could have a function for creating files, rather than cat/echo.
added 11 commits
-
037e93fe - 1 commit from branch
live-team:master
- 978b0846 - Ensure that SOURCE_DATE_EPOCH is always set in all sub scripts.
- 189829a0 - Pass along SOURCE_DATE_EPOCH and keep timestamps while copying
- 5fe46913 - SOURCE_DATE_EPOCH is always set
- 0172eca1 - Use SOURCE_DATE_EPOCH for the splash screen
- 4f952922 - Use SOURCE_DATE_EPOCH for the partition and keep timestamps while copying
- 3d974de1 - Use SOURCE_DATE_EPOCH for the partition and keep timestamps while copying
- a8513845 - Apply SOURCE_DATE_EPOCH to newly generated files
- 4752f672 - Apply SOURCE_DATE_EPOCH to .disk/mkisofs
- 195436b3 - Set timestamps in embedded files
- 84547bb5 - Resolve the timestamp of LB_ISO_VOLUME during building the image (lb build)...
Toggle commit list-
037e93fe - 1 commit from branch
added 11 commits
-
b8279ed4 - 1 commit from branch
live-team:master
- c6ac83ac - Ensure that SOURCE_DATE_EPOCH is always set in all sub scripts.
- 9d17cdbf - SOURCE_DATE_EPOCH is always set
- e91e384d - Use SOURCE_DATE_EPOCH for 'now'
- 834f3379 - Use SOURCE_DATE_EPOCH for 'now'
- a1545d9e - Use SOURCE_DATE_EPOCH for the partition-id
- 42afa8fd - Use SOURCE_DATE_EPOCH for the partition-id
- b46c7672 - Set timestamp embedded in EFI files
- 1dd3fe30 - Set timestamp in embedded files
- b1c7409d - Set timestamp in embedded files
- 9672c22e - Apply SOURCE_DATE_EPOCH to newly generated files
Toggle commit list-
b8279ed4 - 1 commit from branch
I've cleaned the code up a bit (consistent invocations of
date
andtouch
) and read more aboutSOURCE_DATE_EPOCH
at https://reproducible-builds.org/docs/source-date-epoch/. Especially the section '“Lying about the time” / “violates language spec”' shows: the purpose ofSOURCE_DATE_EPOCH
is exactly to modify timestamps.A. The party that creates an image that is meant to be trustworthy (e.g. the debian-cd team) would not need to set
SOURCE_DATE_EPOCH
to get a traceable image, but IMO would setSOURCE_DATE_EPOCH
and usesnapshots.debian.org
.B. The party that wants to verify the image sets
SOURCE_DATE_EPOCH
to the timestamp of the image and will be able to create a 100% identical file, thereby verifying the trustworthiness of party A.The patch is now ready to be merged.
The patch is now ready to be merged.
Sorry, but it is not. I've already explained that changing content of files coming from packages breaks hashsums, and it is wrong. You need to remove those blanket find -exec calls, and instead change the timestamps of files created directly by live-build/live-config.
Note the the
find
command(s) in the MR only works only on thebinary
folders.The new file
binary.modified_timestamps
contains a list of all files/folders that got their timestamp aligned to the momentlb build
was invoked:binary binary/sha256sum.txt binary/efi.img binary/EFI binary/EFI/boot binary/.disk binary/.disk/info binary/.disk/archive_trace binary/isolinux binary/isolinux/splash.png binary/isolinux/utilities.cfg binary/isolinux/menu.cfg binary/isolinux/stdmenu.cfg binary/isolinux/live.cfg binary/isolinux/isolinux.cfg binary/boot binary/boot/grub binary/boot/grub/efi.img binary/boot/grub/i386-efi binary/boot/grub/i386-efi/grub.cfg binary/boot/grub/x86_64-efi binary/boot/grub/x86_64-efi/grub.cfg binary/boot/grub/loopback.cfg binary/boot/grub/theme.cfg binary/boot/grub/grub.cfg binary/boot/grub/config.cfg binary/boot/grub/live-theme binary/boot/grub/live-theme/theme.txt binary/live binary/live/initrd.img binary/live/vmlinuz binary/live/filesystem.packages binary/live/filesystem.squashfs binary/live/filesystem.size
As you can see, no files of regular packages were adjusted. Also (as I wrote on my blog) with some additional hooks, it is possible to get a 100% reproducible
binary/live/filesystem.squashfs
, which exactly match the sha256sum.