Run workflow callbacks via the root workflow orchestrator
Yesterday I was working on the workflow changes for #756, and it occurred to me that the scheduler should probably handle workflow callbacks more like the way it handles sub-workflows: rather than running the callback directly, it should call the root workflow's orchestrator and let it run the callback in the same way that it's expected to re-orchestrate any sub-workflows.
At the moment, the scheduler only calls the root workflow's orchestrator if one of its workflows (including itself) is pending. Once it has been started, a workflow stays running until it's completed or aborted. That means that if a callback allows adding new sub-workflows to the workflow's graph, it must add them directly; it can't just finish and leave the rest of the work to the populate
method, because that will never be called again unless some other sub-workflow happens to become pending.
While this approach could work, I find it unnecessarily confusing, and I think things would be easier to understand if both pending workflow callbacks and pending sub-workflows were handled by calling the root workflow's orchestrator. Orchestrators would just need to mark any of their pending workflow callbacks as running and call orchestrate_workflow
on them, which is already how they handle their pending sub-workflows.
The last time I touched the callback-handling code in the scheduler, I was focused on regression analysis, where the callback doesn't create any additional work requests and just runs some queries over the existing ones. The current arrangement does in principle allow parallelizing that between sub-workflows, which this proposal would lose; but that's probably quite a minor benefit, and if there were lots of things happening in parallel in a large workflow then there might be a trade-off against the cost of dispatching lots of Celery tasks. I think it would be OK to sacrifice that possibility in order to make workflow orchestration a bit easier to understand.