Need catch-up scraper to deal with lost webhook events
See salsa/support#516. I had hoped from the docs and UI that gitlab would actually do reliable delivery of tags. It has a retry mechanism and everything! But, no.
I did some experimenting:
-
The "list projects" API can get all projects. Ie https://docs.gitlab.com/api/projects/#list-all-projects
Empirically, I tried
https://salsa.debian.org/api/v4/projects?order_by=updated_at&updated_after=2025-10-13T10:00:00+01:00
Itt functioned as expected. Notably, re-pushing an old tag in dgit-test-dummy did push dgit-test-dummy back to the top of the list, and the
last_activity_at
was the time of my push.I think we can use this to maintain a list of repos we need to examine.
-
The "List tags" API sort of works. Ie https://docs.gitlab.com/api/tags/#list-project-repository-tags
Emperically, I tried
https://salsa.debian.org/api/v4/projects/36575/repository/tags?order_by=updated
It does list all the tags, but the order is by the date within the tag. Re-pushing an old tag doesn't get it back to the top of the list.
I think this means that to rescan one repository we need to query for all tags more recent than our tag expiry timeout, not just all tags more recent than our last scan. That would be all tags made in the last week, which is hopefully not too onerous.