Supersede old package versions in archives
@hertzog asked about the plan to remove old package versions in !2221 (comment 650841), and I realized that we don't currently have an issue for this. We can get away without it in the very short term, but of course it's something we need to implement soon.
Removing old versions can't be done at the same time as adding new ones. For a suite with published indexes, old versions may only be removed once they're no longer referenced by new indexes, since otherwise there'll be a time window when clients expect to be able to fetch those versions and can't. Debian repository implementations typically have some kind of a grace period between packages being superseded by newer versions and being removed from the pool; in that period, they don't appear in package indexes, but they're still in the pool so that clients that have recently done apt update
are reasonably unlikely to get 404s from a following apt full-upgrade
or similar.
We have a couple of different retention periods for collection items right now, described in https://freexian-team.pages.debian.net/debusine/explanation/expiration-of-data.html#collection-items. However, these aren't enough for archives, because full_history_retention_period
is already used for snapshots: for example, leaving that field unset allows using snapshot queries to look at the entire history of the archive, but that doesn't mean you want newly-generated Sources
and Packages
files to grow indefinitely as new package versions are added.
There's also the wrinkle that in some cases multiple versions of a package, particularly source packages, ought to be retained: a newer version of the source package might have failed to build on some architectures, or current binary packages might refer to older versions via Built-Using
. For example, at the time of writing:
$ curl -s https://deb.debian.org/debian/dists/unstable/main/source/Sources.xz | xzcat | grep-dctrl -sPackage,Version,Extra-Source-Only -PX alabaster
Package: alabaster
Version: 0.7.8-1.1
Extra-Source-Only: yes
Package: alabaster
Version: 0.7.12-1
Extra-Source-Only: yes
Package: alabaster
Version: 0.7.16-0.1
To handle all this, we need some way to capture each of these steps in a package version's lifecycle:
- version is current and included in indexes
- version should be removed eventually (whether due to a new package version, or an explicit removal) but is still present in the suite's latest indexes
- version is no longer present in the suite's latest indexes, but we're still keeping it around for a short grace period
- version has been removed, but is still available in snapshots
- version is no longer available, but the fact that it once existed is still visible in the history of the collection
- version has been forgotten
full_history_retention_period
captures the transition from 4 to 5, and metadata_only_retention_period
captures the transition from 5 to 6. But at the moment the only thing we have that could capture anything between 1 and 4 is setting CollectionItem.removed_at
.
A fairly simple option would be to record an additional "superseded" time for each collection item. This would indicate that the item is eligible for removal once it's no longer referenced. We don't have explicit relations from indexes to the packages they contain because the number of database rows would get really out of hand if we did that, but it can be made relatively safe anyway: the code that decides whether to remove superseded items can look for when the first set of indexes were generated after their superseded time, and only remove them once some grace period has elapsed after that time.
Keeping multiple versions of source packages around is a bit more complicated in this scheme, because they might be superseded by a newer version, remain needed for ages, and then stop being needed. We could track both times (superseded and became-unreferenced) if we wanted to be very careful. However, I don't think that's really necessary: the problem with apt client races is mostly solved by having any newer version available, so it isn't a big deal if a package that has been superseded for ages eventually drops out of indexes and the pool at around the same time.
The GenerateSuiteIndexes
task would need to exclude superseded packages, except for source package versions where there are still non-superseded binary package versions that were either built from those source package versions or reference them in Built-Using
. Given current timings, I think we have enough headroom to do this on the fly.
Finally, we need something to remove superseded packages. This is probably another server task that runs as part of update_suite
.