Skip to content

Waiting for a whole workflow to finish is difficult

I'd like to update the integration tests to use workflows, now that we have them: partly to exercise them, and partly because it would let us avoid having to pass around environment IDs. However, it's a bit difficult right now. debusine create-workflow returns the work request that acts as the root of the workflow (reasonably enough), but that work request is marked as completed as soon as the corresponding workflow orchestrator has finished; that means we can't just use the usual debusine on-work-request-completed arrangement that the integration tests use for synchronization right now. This feels as though we've made an design mistake somewhere, because the point of using work requests for workflows was to reduce the amount of stuff we need to duplicate.

One possibility would be to have a separate synchronization-point work request at the end of every workflow that depends on everything in the workflow, and the design sort of contemplates that when it says they're "used to represent the entry or exit points" of workflows. However, I don't think we really thought this through well enough. Having separate entry and exit points increases the complexity of rendering workflows in the UI, and means that developers have to think about which one they ought to declare dependencies on. And it means that debusine create-workflow and debusine on-work-request-completed can't really work together in the natural way, because the exit point wouldn't exist until the workflow orchestrator has been run once and so the user doesn't know what ID to wait for.

An alternative approach would be to have a different state for parent work requests (which might be workflow callbacks or synchronization points) as long as their child requests are pending or running. I thought about using RUNNING or BLOCKED for this, but neither is quite right.

  • RUNNING is awkward because server tasks such as UpdateSuiteLintianCollection run on a (Celery) worker but can have child requests, and we don't want them to take up a worker slot just because they're waiting for child requests to finish.
  • BLOCKED is sort of right, but it implies that the task hasn't been able to start yet because of a dependency; this is a bit different because the task has started but can't be marked as completed yet. It would also seem strange to use dependencies both for things that have to complete before the root workflow orchestrator can run and for things that are created by that orchestrator.
  • A new WAITING state would be possible, but is a bit too similar to PENDING.
  • Maybe a new BLOCKED_CHILDREN state that's cleared once all children have been completed or aborted?
  • Alternatively, we could use BLOCKED with a different unblock_strategy that unblocks the work request once its children have been completed or aborted. (This only works if we don't care about the unblock_strategy that was in place before that work request started, though.)

Either way, this would mean a bit more complexity in the scheduler, but it would make workflows easier to use, which I think would be a good trade-off.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information