Commit 04ca865a authored by Holger Levsen's avatar Holger Levsen

berlin-summit: add (unverified) pads from riseup, without interlinks yet

Signed-off-by: Holger Levsen's avatarHolger Levsen <>
parent 982a5d63
......@@ -56,6 +56,7 @@ event_summary: Three day follow-up workshop to Athens 2015 to continue and grow
<div class="eight columns text">
<ul>A number of reports have been published:
<li><a href="{{ "/events/berlin2016/agenda/" | prepend: site.baseurl }}">Meeting agenda and complete minutes</a></li>
<li><a href="/files/ReproducibleBuildsSummitIIReport.pdf">Report by Aspiration</a></li>
<li>by <a href="">Ludovic Courtès, John Darrington and Ricardo Wurmus from the GNU&nbsp;Guix project</a></li>
This diff is collapsed.
layout: event_detail
title: Gettext
event: athens2015
order: 150
permalink: /events/berlin/2016/Gettext/
The problem is that timestamps end up in binaries because of Gettext ([[][bug 49654]] discusses this issue). The timestamps originate with =xgettext=, which writes them into the header of =.pot= (the template for translations, extracted from source code strings). =msgmerge= preserves these timestamps when merging actual translations (=.po= files) with translation templates (=.pot= file). =msgfmt= preserves timestamps when it builds a binary file from a =.po= file. The file ends up being a part of the build artifacts.
If =.pot= files were generated and included by upstream developers there would be no problem, but that’s not always the case. Some argue that =.pot= files should not be part of the source tree, because they are generated artifacts. This means that they might be generated at build time, introducing build time timestamps.
The POT creation date is useful for translators because it tells them whether the entire file needs to be reviewed (relative to the =.po= file). Hence, it is debatable whether it should be avoided to introduce it in the first place (by patching =xgettext=), or if it should just be excluded from the generated files. Having the date in the binary files is useful for recovering =.po= files from an using =msgunfmt=, so removing it completely from files may not be desirable (although this is the approach preferred by the former gettext maintainer).
Debian implemented a patch to make xgettext respect =SOURCE_DATE_EPOCH=, but it was rejected by the former maintainer. We came up with an alternative approach: instead of using =SOURCE_DATE_EPOCH= for translation templates (which may be inaccurate) compute the latest modification time for all source files and use /that/ in the timestamp (instead of the current time). A patch has been prepared already.
In addition, we are preparing patches for other approaches such as omitting the timestamp header from the files. Upstream can then pick from one of the possible solutions.
Post event email update (12/16/2016):
For info:
The maintainers of gettext have applied the patches we sent to remove timestamps from the
output of gettext generated .mo files.
layout: event_detail
title: RPM
event: athens2015
order: 80
permalink: /events/berlin/2016/RPM/
brainstorming notes:
Open build service ( ) that runs various configuration for RPM. Vary environment / ... easy.
Build service sign the binaries that get published to the mirror infrastructure
Discussion point: signatures, you can copy signatures to on the newly built package to obtain the same package.
OpenSUSE might still have MD5 in some places, Fedora has switched to SHA-256.
for fedora "Mock" creates the environment and chroot, install build dependencies and build. So build is failing when missing depenency.
Needs to set SOURCE_DATE_EPOCH? Timestamp will be different, but timestamp is in the spec file? A end-user might want to download a source package from anywhere.
Problems in RPM?
- What is the base level of info to have a reproducible build? Is RPM sufficient?
* srpm does not specify the actual dependencies that will be used to build (gcc x.y.z). Maybe need a build-info file.
* not custom field in rpm metadata? No, cannot add arbitrary build metadata to RPM
Can the metadata extended? Potentially
* Needs a metadata of the RPM not in the RPM.
RPM = cpio archive, want bit-by-bit reproducible ideally
* Issues
order within the archive needs to be deterministic
* Identifies the problem for reproduicibility in changing RPM
potential push back on removing the build time from the header
host name might be an issue too.
* List of criteria -> see debian and
then set-up test suite to assert reproducibility
* Need to record stuff?
- cpu_type?
- version of mock used?
- how much of the extra stuff we need to record?
-> You can record more than you need, it is ok to have different build info files.
* Interesting: compare build in openSUSE and Fedora (different build system), do we get the same output?
Just run and use diffoscope to compare the output.
* 1 or 2 small goals for the RPM
- Getting to know what need to change in RPM build to attain
- Document level of reproducibility with a standard test suite (where vary time then env then path then X...)
(disgression on debian not normalizing the environment) -- 15' left
* tool to reproduce the environment (build input, etc...)
would take build info and set-up the build environment
* tool to generate build-info from RPM file
would be of use for Qubes OS
Idea for a hackathon tomorrow?
Could be part of the RPM tool itself or other place.
Capture uname is easy but capturing mock or similar fake environment builder.
take idea from the debian build infos files.
What info on build info file to be reproducible and what kind of tool to make it easy.
layout: event_detail
title: RPMII
event: athens2015
order: 100
permalink: /events/berlin/2016/RPMII/
* create a tool to generate buildinfo files similar to Debian's
* later create or extend a tool to use buildinfo to create a similar environment to rebuild a package later
Next steps:
* put the pieces together
* test it
* add it to a git repo
- buildinfo spec (Debian):
- RPM file format (draft?):
example buildinfo files at
to be run at the end of rpmbuild or after it, run by the tool calling rpmbuild or both (second one appending extra information)
#buildinfo generator code snippet:
echo Installed-Build-Depends:
# might need to run outside the build chroot, because it might have an incompatible rpm version that cannot read the DB created by a newer rpm
rpm -qa | sed -e 's/-\([^-]*-[^-]*\)\.\([^.]*\)$/:\2 (= \1)/; s/^/ /'
# ver rel arch
echo Environment:
eval value=\$$var
[ -n "$value" ] && echo " $var=\"$value\""
# whitelist in dpkg:
function getos
test -r /etc/os-release && . /etc/os-release
if [ -z "$ID" ] ; then
ID=$(cat /etc/system-release)
echo "$ID"
echo "Build-Origin: $(getos)"
echo "Build-Date: `date -R`" # - not from rpm because that will be $SOURCE_DATE_EPOCH
libc6:i386 (= 2.24)
libgcc:x86_64 (= 4.4.7-17.el6)
printf 'Format: 1.0\n'
printf 'Build-Architecture: %s\n' "$(uname -m)"
Source: $(rpmspec -q --queryformat '%{name}' "$specfile")
Binary: $(find $(rpm --eval %{_rpmdir}) -name *rpm|xargs rpm -qp --qf "%{name} ") # /usr/src/packages/RPMS/*/*.rpm or equivalent
Version: $(rpmspec -q --queryformat '%{version}-%{release}' "$specfile")
Architecture: $(rpm -q --queryformat '%{arch}' -p "$srcrpm")
# other:
Checksum-*: ... sha256sum $rpm $specfile $srcrpm # and rpm size # omit MD5+SHA1 because nobody should use that anymore
size=$(stat -c '%s' $rpm)
Build-Path: $(rpm --eval '%{_builddir}')
Example looking at
layout: event_detail
event: athens2015
order: 180
permalink: /events/berlin/2016/SOURCE_PREFIX_MAP/
- set by a build-tool
- for mapping build paths
- honoured by GCC and every build tool that generates build-paths
- how to support multiple mappings
one or multiple mappings?
- multiple is better, allows for more intuitive overrides by child processes
env variable preferred over cli
separator character, space or newline?
how to apply the mappings when eventually set?
- multiple ordered mappings, child build processes append to this map (to the end)
- child build tools apply the mapping last to first
the exact format of the envvar
- expressings multiple paths into a single string is hard
- "common things easy, uncommon things possible"
- (infinity0, doko) research passing newlines through shell, m4, autoconf
- look how gdb parse and loads symbol paths to source code paths
layout: event_detail
title: Agenda
event: athens2015
order: 10
permalink: /events/athens2015/agenda/
Reproducible Builds Summit II
December 13-15, 2016. Berlin, Germany
Day 1
Tuesday, December 13
09.15 Breakfast
10.00 Opening Session
11.00 Project Showcase
* Reproducible FreeBSD
* Status stretch and buster
* openSUSE
* Test How we constantly test Debian
* Eliminating absolute build paths from debusgging info and other things
* OpenWrt, coreboot, LEDE
* F-Droid. Reproducible Adroid apps
* Building reproducible Tails ISO images [work in progress]
11.50 Break
12.15 Agenda Brainstorming
13.20 Lunch Break
14.30 Working Sessions I
* diffoscope
* reprotest
* Documentation
* User verification
* Embedded
16.00 Closing Session
Proposals for hacking sessions to take place later in the afternoon:
* SquashFS
* FreeBSD filesystems
* Pyton packages in git
* Gettext
* Make diffoscope deal with Android apks
* Markdowns
16:20 Adjourn
16:30 Hacking
[Please feel welcome to add here links to any documentation related to your hacking efforts]
Day 2
Wednesday, December 14
09.15 Breakfast
10.00 Opening Session
10:20 Working Sessions II
* build info files
* Reproducible images
* Defining reproducible builds I
* End user policies
* Test infrastructure
* Gettext
11.45 Break
12.30 Skill Share
[Note taking is not required during skill share discussions, but in case you took notes, please feel welcome to link them below]
* git-based packaging
* How to use C sanitizers + fuzzing
* How to make storage deduplicate and incentivize reproducible builds hash
* How to use buildinfos to analyze/test reproducibility
* Fedora AMA
* How to use emacs
* How to run a start-up
* How to apply for funding from CII
* How to improve cross-distro packaging
* How (not) to use iframes on awesome webpages *
* How to sign code in git (and correctly verify the signatures)
* Ask me anything about building on OSX
* How to do automatic hardware testing
13.00 Lunch Break
14.20 Working Sessions III
* What Else for the Auditable Ecosystem? (was )
* Documentation II
* Defining reproducible builds II
* Bootstrapping
* Reproducible builds use cases
* Reproducible builds and License/GPL compliance
16.00 Closing Session
Proposals for hacking sessions to take place later in the afternoon:
* is git acceptably secure?
* Make build images reproducible
* diffoscope
* Documentation
* Funding and CII
* RPM and hacking
* Nix build stuff to be incorporated with test at
* Boostrap test jenkins to replicate
* Embedded images cross-distro
16:30 Adjourn
16.40 Hacking
[Please feel welcome to add here links to any documentation related to your hacking efforts]
Day 3
Thursday, December 15
09.15 Breakfast
10.00 Opening Session
10:20 Working Sessions VI
* Cross-distro collaboration on reproducible builds
* Boostrapping II
* Documentation III
* Binary transparency II
* State of Reproducible Builds
11:50 Break
12:05 Reporting session outcomes
Proposals for hacking sessions to take place today:
* diffoscope debug
* diffoscope everything
* Documentation
* buildinfo
* looking at buildinfo coming from different architectures
* Gettext
* FreeBSD filesystems
* Android documentation
* Reproducing the test environment and documenting it
* Debian infrastructure
* Binary transparency log
* Android infrastructure
12:30 Hacking
[Please feel welcome to add here links to any documentation related to your hacking efforts]
* F-droid and append-only publication log (documented through pictured sketch: reproduciblebuildsII-FDroidpublicationlog_01.jpg)
13:15 Lunch Break
14:30 2017 Look Ahead Session
15.00 Closing Session
15.30 Adjourn
This diff is collapsed.
layout: event_detail
title: binarytransparency
event: athens2015
order: 170
permalink: /events/berlin/2016/binarytransparency/
layout: event_detail
title: binarytransparencyII
event: athens2015
order: 280
permalink: /events/berlin/2016/binarytransparencyII/
Can we use the same idea as certificate transparency for the packages?
Why do we need this:
* To get the common idea of what build result should be;
* People who do not want to build from sourced can read who had produced the same binary they are trying to install;
How does it work for website certificates: we have the information about when the certificate was issued and when does it expire. But what do we do with the idea of revocation?
Problems with revocation:
* How do I know the patch manifesto is the most recent one?
* Man-in-the middle can defy us the ubility to get information that the certificate was revoked (or even remember the "right" answer and use it later).
What can we use for the packages:
Hash of the log file.
We need information about all previous changes - maybe there were revocations?
Do we need every change of every, e.g. Debian package?
Or can we keep the separate revocation log: not really useful.
Revocation problem: there are at least 2 different situations:
1. I found the bug in my test infrastructure, so please ignore my last result(s)
2. I reproduced the build yesterday, but failed to reproduce it today.
Do we need to treat them differently?
Keep the logs;
Keep buildinfo files. If we reproduced - good, if not - check the inputs. If the inputs are different, this is not so surprising (although still can be a sign of non-reproducibility)
Look through all the logs:
1. Select all buildinfo-s for the packages;
2. Do all the output match?
3. If output is different - are inputs different?
Put buildinfos into a log. Log has tree structures. Log infrastructure should be:
* public: no targeted attacks.
* audible: if log is permanent, view is consistent both over time and between users.
We want to look up output binaries later to run diffoscope on them.
Keep them somewhere (cloud) stored by hash instead of names.
Hash tree log:
-> Trust buildinfo records visible
-> We can share it share this between distros!
We put buildinfo in there, because:
-> Now anyone can collect it and check if info was gathered correctly.
The fact that buildinfo captures output hash gives us the opportunity to look up this hash later, find the stored output binary and run diffoscope.
Just agree to log, then everyone can choose how to interpret them.
This diff is collapsed.
layout: event_detail
title: bootstrappingII
event: athens2015
order: 260
permalink: /events/berlin/2016/bootstrappingII/
From email sent to the list on 12/16/2016:
Hello there!
Here’s a WIP preview of what will soon/eventually be available at (once the domain has been assigned and
mapped). It’s the website for our new “bootstrappable builds” project
that was born during the Reproducible Builds summit 2016
(i.e. yesterday).
Thanks to all the prolific writers in the bootstrapping sessions who
contributed so much eloquent prose! (All mistakes and omissions are
This is currently running on a weak box in my living room, so please be
gentle with traffic:
Until we get proper code hosting (already arranged for, just waiting)
the code for the website is available here:
Comments and patches are very welcome!
layout: event_detail
title: buildinfofiles
event: athens2015
order: 90
permalink: /events/berlin/2016/buildinfofiles/
= early work
a goal was to minimise the conditions needed to reproduce a binary
buildinfo would be a formula to reproduce a build - it should be small as possible
they don't/can't describe every possible input - build process is affected by obscure things or external, variable factors
1. buildinfo files:
record inputs to the build that produced the output - so that you can recreate its state
2. analysis of buildinfo and outputs:
as more builders provide buildinfo files, we can look for intersections (reproducible binaries), and causes of any differences (non-reproducibility)
should contain the minimal information needed to produce a given binary
3. the ideal (reproducible) build would depend only the source code and build dependencies
buildinfo should be small, compact, and easily distributable
= they might contain:
source package (name, version, hash?)
binaries produced (name, arch, checksums)
build dependencies (recursively)
build path (until recently?)
environment variables (since recently?)
in Debian, buildinfo is a separate file
in Arch Linux, buildinfo is included in the package files (but signatures are detached)
= consuming and aggregating buildinfo files:
in Debian, buildinfo files are used when:
* DD uploads a package
* debian-ftp system distributes packages
* end-user installs packages
and now we also realised:
* rebuilders
* buildinfo distributors
= further work
we want to collate and distribute buildinfo files from external parties too;
not just those from Debian developers and the official builds
collecting and distributing those, is a quite different task than just distributing buildinfo from Debian's official builds
lamby's already collects and distributes some non-official buildinfo files
we will need to write tools making it easy to test reproducibile and submit buildinfo,
and tools to retrieve buildinfo files/signatures when installing
signed buildinfos save people from having to build every package themselves -
it gives them sufficient confidence to trust pre-built binaries
= ongoing concerns
buildinfo files should to be detailed enough to explain the causes of non-reproducibility
but too much information ($HOME, hostname, installed packaged versions)
argument arose that a normalised build environment evoids lots of reproducibility issues,
like build path, environment etc. affecting the build
whilst that would be easier, some of us think that is really a bug in the software that ought to be fixed
in the extreme case,
when a build-dependency affects an output binary, we may need to generate a new set of buildinfo files
describing that situation
layout: event_detail
title: crossdistro
event: athens2015
order: 250
permalink: /events/berlin/2016/crossdistro/
# Cross-distro
* web infrastructure for searching, sorting
* don't necessarily need a single database, but maybe the distros share code to run their respective databases
* what's in the buildinfo files
## What's Debian-specific about buildinfo files?
The dependencies are written as NVR - name, version, release - single string
Doc team working on getting buildinfo specification up to the doc page
Debian wiki says half the fields are the output of a deb-* tool
* But that page is outdated
Architecture names are different, and have slightly different semantics
* e.g., Fedora has subarchitectures of armhbf - You could build RPMs for just that family, not optimized for any subarchitecture
Considering adding a Known-Signature field to the buildinfo file - You're expected to copy it when you rebuild the RPM, since you don't have the private key to re-sign it
Which fields are necessarily distro-specific? Which can we ask the distros to conform to a spec?
Arch buildinfo files are included in the package, so they only include things that don't change. They're not even using RFC822 format.
Signatures: buildinfo files are made at build time, but the RPM is signed later. So presumably the buildinfo file will include the hash of the unsigned package. Same package can be signed by different keys; e.g., different keys for each release of Fedora.
Could include two checksums - One signed, one unsigned.
Let's publish a buildinfo spec 1.0, and have people simply try to work to fit into it. Then come back in a year and revise for 1.1 for whatever couldn't be handled with 1.0.
* Or is it too early for that? Some preference to update on a rolling basis to try to get problems addressed sooner.
* Document what fields are expected to vary across distributions - based on the content of a Distribution field.
## How can cross-distro communication happen?
r-b-general seems to be OK - but needs to be more widely advertised.
* Docs are going to include a "Get Involved" page aimed at distro folks. This needs to be part of that.
Problem that all the current r-b infrastructure is hosted on Debian. In order to get involved, you need to know Debian process: what channels to use, how to file bugs. It's a barrier to entry. Documentation could also help with this - although that still leaves a perceptual barrier to entry. Something Forge-like would be more friendly - Something that consolidates all the work in a single place. Fedora used to have FedoraHosted; now it has
Some discussion about the usability of - Submitting bugs is too hard.
We discuss whether it's practical to get distros building the same thing. Doesn't seem like it. Requires too much agreement - compiler default, critical patches, etc. - that isn't worthwhile.
## Generalizing buildinfo tools
What about capturing buildinfo for things like distro Docker images?
Is it possible to write a single tool that knows how to generate buildinfo for a wrapped build process? e.g., it would know how to capture common information like timestamp, environment variables, etc., and then has hooks for the specific output type it's generating - e.g., it knows how to capture build dependencies of a package, then how to record the build artifact at the end.
Or maybe the right thing to do is to teach different tools to add to an existing buildinfo file over the course of a build process. This involves extending a lot more tools, but might require less customization per tool.
Some feeling that the process will inevitably be distro-specific which necessarily limits the utility of any general-purpose tool.
Maybe a shared buildinfo validator and parser? Feel unsure about parsing - People are likely to just build on existing RFC822 parsers. Some interest in a tool that understands the semantics of the format better—i.e., if you add this field, you're expected to also add this other field. (Validator would help in many of the same ways.)
Can share the buildinfo database. e.g., generalize This part is more complex and less distro-specific, so it seems worthwhile to share. This also ties in with binary transparency/append-only logs discussion. Needs to move to
## Cross-distro patch tracking
issues-and-notes.yaml - Stored in a Git repository that lists distro-specific issues - Bernhard's site that shows the version of a package in each distro, with links to source repos and patch trackers (todo: bug-trackers?)
Unrelated cross-distro MLs ARM-specific?
## Gentoo
Gentoo is interested but it's trickier for them because they support so many build options. Build options should be recorded in the buildinfo file.
There's also a problem that GCC can get different patches that aren't reflected in the revision, so it's harder to record what you're really running.
buildinfo can be useful for Stage 3 binaries.
As a matter of policy, Gentoo packages can change without changing the package version
Does it make sense for buildinfo files to refer to each other? e.g., this package was built with this version of GCC, here's the buildinfo file for that...
CFLAGS need to be sorted. Probably best to do that when the buildinfo file is written. Just pick an order and use that.
layout: event_detail
title: diffoscope
event: athens2015
order: 30
permalink: /events/berlin/2016/diffoscope/
# diffoscope plan for the meeting
## Reviewing the Post-Its
### Improve Platform Support
- Portability to different platforms should improve. Some tools or used abstractions (like /dev/fd) might not be available.
- Support for distro-specific or uncommon file formats can be improved.
- The testsuite does not work everywhere (e.g. newer versions of software can break things, currently happens for Pascal). It should be more reliable.
### Integrate debdiff & diffoscope
- Should we implement a flag in debdiff to call diffoscope?
- Should we replace debdiff with diffoscope completely? What features of debdiff would diffoscope still need?
- It's not really clear what the post-it author wants.
### Parallel diffoscope (#842837)
- Execution time is a serious issue, diffoscope should get faster.
- It is not clear how well parallel Python is going to work for diffoscope due to the global interpreter lock.
- Prior work by nomeata might exist somewhere (FIXME where?).
- We should have a hack session on parallel diffoscope.
### Marketing/Docs/Undebianization
- should be more well-known; it helpfully has a number of optional stuff for uncommon formats installed already.
- Get the word out for non-reproducibility use cases like comparing across versions for updates or due diligence before deployment.
- The bug tracking happens at the Debian bugtracker. That should be more visible. The website should have a "how to report bugs" section for people not familiar with the Debian bug tracker.
### diffoscope Plugins
- Should diffoscope have a plugin mechanism to support other file formats that authors may not want to upstream?
### Output Format
- Should diffoscope output markdown?
- The output should be more accessible, e.g., for screen readers. Possibly to be implemented as a new output format.
### What are Usability Issues with diffoscope?
- Output limits could lead to spending a lot of processing time and then still not getting usable output. Should all arbitrary limits be removed?
- Short command line options
- Write documentation on how to implement support for a file format
- Should diffoscope support excluding specific paths in archives to cut down runtime and ignore parts that are already known to differ?
- Should diffoscope support disabling support for specific file formats?
### Automatic Classification of Reproducibility Issues in diffoscope
- Is this in scope for diffoscope? Should this be in a separate tool?
- This would require knowledge of the format and lead to much more complex file format support.
- This could be helpful in outputs, e.g. if an offset changes in an ELF binary you'd get lots of related changes that could otherwise be ignored
## Action Items
- lamby to open a bug for the output format accessibility
- Bapt to submit his FreeBSD patches upstream
- everyone to file tickets about portability problems
## Session Proposals
- Hacking parallel diffoscope
- Porting diffoscope
- diffoscope usability
- Documenting, marketing and undebianizing diffoscope