Skip to content

Need catch-up scraper to deal with lost webhook events

See salsa/support#516. I had hoped from the docs and UI that gitlab would actually do reliable delivery of tags. It has a retry mechanism and everything! But, no.

I did some experimenting:

  • The "list projects" API can get all projects. Ie https://docs.gitlab.com/api/projects/#list-all-projects

    Empirically, I tried https://salsa.debian.org/api/v4/projects?order_by=updated_at&updated_after=2025-10-13T10:00:00+01:00

    Itt functioned as expected. Notably, re-pushing an old tag in dgit-test-dummy did push dgit-test-dummy back to the top of the list, and the last_activity_at was the time of my push.

    I think we can use this to maintain a list of repos we need to examine.

  • The "List tags" API sort of works. Ie https://docs.gitlab.com/api/tags/#list-project-repository-tags

    Emperically, I tried https://salsa.debian.org/api/v4/projects/36575/repository/tags?order_by=updated

    It does list all the tags, but the order is by the date within the tag. Re-pushing an old tag doesn't get it back to the top of the list.

    I think this means that to rescan one repository we need to query for all tags more recent than our tag expiry timeout, not just all tags more recent than our last scan. That would be all tags made in the last week, which is hopefully not too onerous.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information