delete_expired: Optimize calculation of which artifacts must be kept

The traverse-and-mark algorithm here is an ideal application for a recursive CTE, allowing most of the work to be done in the database.

We still have to pull the list of artifact IDs to keep into Python to work around poor PostgreSQL planner behaviour, but with current production databases this is quick enough not to be worth worrying about.

On debusine.debian.net, this takes the "mark" part of DeleteExpiredArtifacts from about eight minutes to about four seconds, which is important since it currently runs with heavyweight locks held. I checked that it returns the same set of artifacts as before.

This should help with #473 and #811.

Merge request reports

Loading