Skip to content

Cloud Compute: Provisioning Celery Service

Part of #538 (closed):

Create a new celery service to manage cloud worker pools, as described here

Include a basic framework for regularly running a task to decide whether to spool up/down instances. This doesn't need to do anything, yet.

Implementation plan:

  • Change WorkersManager.waiting_for_work_request to return static workers first (!1665 (merged))
  • Implement computing idle time for a worker in the last X hours (!1668 (merged))
  • Implement a pool function to terminate a worker, that disables it first so that the scheduler does not allocate work requests during the time it takes to the termination command to complete (!1669 (merged))
  • Implement decommissioning: look at idle cloud workers only. If their idle time is above a certain threshold, terminate them (!1669 (merged))
  • Store worker pool statistics on task completion / worker shutdown (!1708 (merged))
  • Implement commissioning: (!1661 (merged))
    • look at pending work requests, grouped by scope to check the latency targets
    • for each scope, take the list of pending work requests and match them with available static workers and worker pools
      • if the existing worker count is insufficient to handle the pending load in the available target time, compute the number of workers to provision in each pool, in pool priority order, to make it
      • when pool limits are reached move on to the next pool, or stop as we reached maximum capacity
      • possibly cap the number of new workers provisioned to absorb spikes of pending work requests from big workflows that just ran populate. It may not be needed: it can be that looking at estimated time is enough to absorb fluctuations, as a newly populated huge workflow will have an estimated processing time that likely dominates the current worker availability
  • Add a Celery worker and systemd service (!1695 (merged))
Edited by Colin Watson
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information