STF M5: scaling with the cloud
Description and rationale
By leveraging the API and the availability of the Debian repository within debusine, Debian developers can easily experiment with changes impacting the whole Debian archive. This can potentially overload debusine workers, therefore delaying the more important day-to-day workload. Proper scheduling of tasks is introduced to prevent this, while also ensuring that archive-wide experiments are not systematically taking forever to complete because of their second-class status.
To manage those contradicting needs, Debusine has been extended to leverage the power of the cloud: supplementary workers can be spun up to process the queue when it’s getting too large, and cloud storage can be used so that the amount of data generated by those experiments is not an issue when disk space is getting low. Support for Amazon’s Cloud is planned (Debian has free credits there) but also for Hetzner as a German cloud provider, as well as two other cloud providers requested by the community.
Debusine administrators can configure workers in different ways to ensure availability of workers for specific use cases:
- Worker restricted to a (set of) workspace(s)
- Worker restricted to specific tasks
- Worker restricted to high priority work requests
Developer perspective
To be able to leverage the cloud, some of debusine’s key concepts need to be reworked. Artifacts can be uploaded in the cloud and retrieved from the cloud. We also need to record metrics on work requests to be able to figure out the ideal size of the worker pool needed to process what’s currently in the queue. The pool can then be auto-sized at all times, while respecting predefined cost constraints.