Skip to content
Snippets Groups Projects

Rclobus/reproducible images

Keep the original timestamps of the individual files as much as possible. Always set SOURCE_DATE_EPOCH at the beginning and use that value where needed.

Edited by Roland Clobus

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
68 68 lb chroot_prep install "${CHROOT_PREP_MOUNTS}" "${@}"
69 69 fi
70 70
71 # Apply SOURCE_DATE_EPOCH to all modified and created files
72 find binary -newermt "$(date '+%Y-%m-%d %H:%M:%S' -d @${SOURCE_DATE_EPOCH})" -print -exec touch '{}' -d@${SOURCE_DATE_EPOCH} --no-dereference ';' > binary.modified_timestamps
    • The find command sets the timestamp for all files, directories and symlinks in the binary directory. The content of this binary directory becomes the root of the the generated iso-image. The environment variable SOURCE_DATE_EPOCH can be either set outside of the invocation of lb build, as I am using, or it will be set in functions/configuration.sh abcb27d8.

      I've now run a new batch of tests without setting SOURCE_DATE_EPOCH before invoking lb build. The results are not what they should be. I'll update this merge request soon.

    • Right, but changing all files in the chroot is not right then? Only files explicitly created at build time should have their timestamps changed. Anything coming from packages should keep the timestamps they have in the packages.

    • The timestamps of the files in the chroot, which will land in live/filesystem.squashfs, are clamped by mksquashfs when SOURCE_DATE_EPOCH is set. The find command that you pointed out will set the timestamps for all files that are newer than SOURCE_DATE_EPOCH in the binary directory. All older files will have their timestamp preserved.

    • Please register or sign in to reply
  • If you want, I can split this change in 2 parts:

    1. Preserving timestamps during copy command (and similar), all which do not need SOURCE_DATE_EPOCH
    2. Remaining steps to modify timestamps, requiring SOURCE_DATE_EPOCH
  • Roland Clobus marked as a Work In Progress

    marked as a Work In Progress

  • Roland Clobus added 21 commits

    added 21 commits

    • 2170ef0a...3f7dd00f - 6 commits from branch live-team:master
    • 5f188771 - Keep timestamps while copying
    • cfcb5657 - Keep timestamps while copying
    • bb10ed35 - Keep timestamps while copying
    • 3b753026 - Keep timestamps while copying
    • ed2b2608 - Keep timestamps while copying
    • 15673b91 - Ensure that SOURCE_DATE_EPOCH is always set in all sub scripts.
    • a37de1af - Pass along SOURCE_DATE_EPOCH and keep timestamps while copying
    • 42674abc - Pass along SOURCE_DATE_EPOCH
    • 327bb4cc - Use SOURCE_DATE_EPOCH for the splash screen
    • e048b073 - Use SOURCE_DATE_EPOCH for the partition and keep timestamps while copying
    • eb3ff072 - Use SOURCE_DATE_EPOCH for the partition and keep timestamps while copying
    • 442d5063 - Keep timestamps while copying and remove the file filesystem.XXX-remove when empty
    • acc28b28 - Apply SOURCE_DATE_EPOCH to newly generated files
    • d7b5e037 - Apply SOURCE_DATE_EPOCH to .disk/mkisofs
    • 9a742914 - Set timestamps in embedded files

    Compare with previous version

  • Roland Clobus added 9 commits

    added 9 commits

    • eb57b2f1 - SOURCE_DATE_EPOCH is always set
    • 8baf6418 - Use SOURCE_DATE_EPOCH for the splash screen
    • a0f7af1d - Use SOURCE_DATE_EPOCH for the partition and keep timestamps while copying
    • 83fea8cc - Use SOURCE_DATE_EPOCH for the partition and keep timestamps while copying
    • e587110d - Keep timestamps while copying and remove the file filesystem.XXX-remove when empty
    • adb7717e - Apply SOURCE_DATE_EPOCH to newly generated files
    • 1fb3551f - Apply SOURCE_DATE_EPOCH to .disk/mkisofs
    • 5aa31ae2 - Set timestamps in embedded files
    • cdda09bd - Resolve the timestamp of LB_ISO_VOLUME during building the image (lb build)...

    Compare with previous version

  • Roland Clobus marked this merge request as ready

    marked this merge request as ready

  • I've updated the code now.

    I've tested the following main scenarios: A) Calling 'lb config' without SOURCE_DATE_EPOCH set, followed by 'lb build', will result in an image that is nearly the same as before this patch. The only difference is that the timestamps in the image (not only in the squashfs part) will all be identical to the moment 'lb build' was invoked (in local time). B) Set SOURCE_DATE_EPOCH, followed by 'lb config' and 'lb build': The usual scenario, results in reproducible images (with UTC timestamps)

    Other (less typical) scenarios: C) Calling 'lb config' without SOURCE_DATE_EPOCH set, setting SOURCE_DATE_EPOCH, followed by 'lb build': Not a typical invocation sequence. The timestamps are set to SOURCE_DATE_EPOCH, in local time. D) Set SOURCE_DATE_EPOCH, calling 'lb config'. The configuration will use UTC, and can be used as a basis for later reproducible builds (then setting SOURCE_DATE_EPOCH and calling 'lb build')

    • The only difference is that the timestamps in the image (not only in the squashfs part) will all be identical to the moment 'lb build' was invoked (in local time).

      As mentioned before, changing timestamps of files that come from packages is wrong. The hashsums will no longer match. This does not help with reproducibility in any way: if a package doesn't change, its timestamps do not change either.

      Fixing the timestamps of files created during the build process is of course fine.

    • Ah, I think I understand your issue now. The timestamps (both in the ISO and squashfs part) are preserved as much as possible, due to the many cp -a commands that I've added. The find -newermt command will only touch files/folders/symlinks that have a timestamp that is newer than SOURCE_DATE_EPOCH and therefore could not have originated from the packages.

      I've used this configuration to test the difference in behaviour of upstream/master and rclobus/reproducible_images:

      lb config --apt-http-proxy http://localhost:3142 --parent-mirror-bootstrap http://localhost:3142/snapshot.debian.org/archive/debian/20200926T104248Z --parent-mirror-binary http://deb.debian.org/debian/ --security false --updates false --apt-options "--yes -o Acquire::Check-Valid-Until=false" --distribution buster --debian-installer none --loadlin false --cache-packages false

      When I compare the result with diffoscope, I can see that in e.g. /boot/grub/i386-efi/ the .mod files now have their original (as packaged) timestamps preserved, which was not the case in the current upstream/master.

    • The find -newermt command will only touch files/folders/symlinks that have a timestamp that is newer than SOURCE_DATE_EPOCH and therefore could not have originated from the packages.

      This depends entirely on what the SOURCE_DATE_EPOCH is set to. That is a problem - it must never change files from packages, regardless of conditions and configs.

    • But isn't the whole idea behind SOURCE_DATE_EPOCH, that it should allow you to reproduce an earlier state?

      A regular user will not set SOURCE_DATE_EPOCH and will get a live image that contains the timestamp of the moment lb build was invoked, and all original timestamps are preserved (which I think is correct).

      When SOURCE_DATE_EPOCH is set before calling lb build, one needs to be sure that the packages that are used for building match that specific date. That's why I've used snapshots as the source for my Debian packages. I'm able to verify with 100% certainty that the image comes from that specific point in time and has not been altered.

      If the user would set SOURCE_DATE_EPOCH to a value too far in the past, the resulting image can not be traced to the moment it was generated and therefore will not be reproducible at all. A checksum of that ISO image would then quickly show that something went wrong.

      Alternatively, if the find -newermt approach should be abandoned again, the scripts will be littered with individual touch commands, similar to my original approach, which can be seen in !209 (closed). Each newly created or modified file would then need its own touch command to ensure a consistent, identical timestamp throughout the image.

    • But isn't the whole idea behind SOURCE_DATE_EPOCH, that it should allow you to reproduce an earlier state?

      An earlier state is different from a state that never was. Packages are fixed and reproducible themselves - if you change their content, they are no longer reproducible, and most importantly the hashsums do not match anymore. Changing files created by live-build/config/etc is fine, changing files that come from repositories is not.

    • So we are now discussing 4 different scenarios:

      1. No SOURCE_DATE_EPOCH is set by the user
      2. SOURCE_DATE_EPOCH is set by the user, which matches the SOURCE_DATE_EPOCH which was used when creating the official live images
      3. SOURCE_DATE_EPOCH is set by the user to a date before the most recent file from a package that is included in the live image
      4. SOURCE_DATE_EPOCH is set by the user to a date later than the SOURCE_DATE_EPOCH which was used when creating the official live images

      Are scenarios 1 and 2 OK and do not need further discussion?

      In scenario 3 some timestamps of some files of packages will be adjusted (which, I agree, should not happen)

      In scenario 4 the timestamps of all newly modified files will differ from the original image.

      I personally think that scenarios 3 and 4 result in images that show that reproducibility is not achieved, and therefore are confirmation that scenario 2 is working properly.

      I think it would be nice if the official Debian live images would be reproducible. That means that the official builder daemon would use snapshots and set EPOCH_SOURCE_DATE accordingly. Any other user, who wants to verify that published binary ISO image, would then be able to re-generate that file with 100% identical content.

    • The problem is not what the variable is set to, the problem is doing arbitrary and unexpected changes to files that should be immutable and that are checksummed. This should not happen under any circumstances, regardless of how one feels about reasonableness of provided configuration. If it means more work, then so be it - for example, we could have a function for creating files, rather than cat/echo.

    • Please register or sign in to reply
  • Can you split out the patches that ensure timestamps are maintained across copies in a separate MR, please?

  • Ok, that means for me:

    1. A MR that does not need any assumption about the availability of EPOCH_SOURCE_DATE
    2. A follow-up MR that always sets EPOCH_SOURCE_DATE. This second MR might need to modify some files of the first MR.
  • Roland Clobus marked this merge request as draft

    marked this merge request as draft

  • This MR is now marked as draft. In MR224 the cp and mcopy commands preserve the timestamps of the original files.

  • Roland Clobus added 11 commits

    added 11 commits

    • 037e93fe - 1 commit from branch live-team:master
    • 978b0846 - Ensure that SOURCE_DATE_EPOCH is always set in all sub scripts.
    • 189829a0 - Pass along SOURCE_DATE_EPOCH and keep timestamps while copying
    • 5fe46913 - SOURCE_DATE_EPOCH is always set
    • 0172eca1 - Use SOURCE_DATE_EPOCH for the splash screen
    • 4f952922 - Use SOURCE_DATE_EPOCH for the partition and keep timestamps while copying
    • 3d974de1 - Use SOURCE_DATE_EPOCH for the partition and keep timestamps while copying
    • a8513845 - Apply SOURCE_DATE_EPOCH to newly generated files
    • 4752f672 - Apply SOURCE_DATE_EPOCH to .disk/mkisofs
    • 195436b3 - Set timestamps in embedded files
    • 84547bb5 - Resolve the timestamp of LB_ISO_VOLUME during building the image (lb build)...

    Compare with previous version

  • Roland Clobus added 11 commits

    added 11 commits

    • b8279ed4 - 1 commit from branch live-team:master
    • c6ac83ac - Ensure that SOURCE_DATE_EPOCH is always set in all sub scripts.
    • 9d17cdbf - SOURCE_DATE_EPOCH is always set
    • e91e384d - Use SOURCE_DATE_EPOCH for 'now'
    • 834f3379 - Use SOURCE_DATE_EPOCH for 'now'
    • a1545d9e - Use SOURCE_DATE_EPOCH for the partition-id
    • 42afa8fd - Use SOURCE_DATE_EPOCH for the partition-id
    • b46c7672 - Set timestamp embedded in EFI files
    • 1dd3fe30 - Set timestamp in embedded files
    • b1c7409d - Set timestamp in embedded files
    • 9672c22e - Apply SOURCE_DATE_EPOCH to newly generated files

    Compare with previous version

  • I've cleaned the code up a bit (consistent invocations of date and touch) and read more about SOURCE_DATE_EPOCH at https://reproducible-builds.org/docs/source-date-epoch/. Especially the section '“Lying about the time” / “violates language spec”' shows: the purpose of SOURCE_DATE_EPOCH is exactly to modify timestamps.

    A. The party that creates an image that is meant to be trustworthy (e.g. the debian-cd team) would not need to set SOURCE_DATE_EPOCH to get a traceable image, but IMO would set SOURCE_DATE_EPOCH and use snapshots.debian.org.

    B. The party that wants to verify the image sets SOURCE_DATE_EPOCH to the timestamp of the image and will be able to create a 100% identical file, thereby verifying the trustworthiness of party A.

    The patch is now ready to be merged.

  • Roland Clobus marked this merge request as ready

    marked this merge request as ready

  • The patch is now ready to be merged.

    Sorry, but it is not. I've already explained that changing content of files coming from packages breaks hashsums, and it is wrong. You need to remove those blanket find -exec calls, and instead change the timestamps of files created directly by live-build/live-config.

  • Note the the find command(s) in the MR only works only on the binary folders.

    The new file binary.modified_timestamps contains a list of all files/folders that got their timestamp aligned to the moment lb build was invoked: binary binary/sha256sum.txt binary/efi.img binary/EFI binary/EFI/boot binary/.disk binary/.disk/info binary/.disk/archive_trace binary/isolinux binary/isolinux/splash.png binary/isolinux/utilities.cfg binary/isolinux/menu.cfg binary/isolinux/stdmenu.cfg binary/isolinux/live.cfg binary/isolinux/isolinux.cfg binary/boot binary/boot/grub binary/boot/grub/efi.img binary/boot/grub/i386-efi binary/boot/grub/i386-efi/grub.cfg binary/boot/grub/x86_64-efi binary/boot/grub/x86_64-efi/grub.cfg binary/boot/grub/loopback.cfg binary/boot/grub/theme.cfg binary/boot/grub/grub.cfg binary/boot/grub/config.cfg binary/boot/grub/live-theme binary/boot/grub/live-theme/theme.txt binary/live binary/live/initrd.img binary/live/vmlinuz binary/live/filesystem.packages binary/live/filesystem.squashfs binary/live/filesystem.size

    As you can see, no files of regular packages were adjusted. Also (as I wrote on my blog) with some additional hooks, it is possible to get a 100% reproducible binary/live/filesystem.squashfs, which exactly match the sha256sum.

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading