Scheduling entirely broken with a completed root workflow and a pending sub-workflow
Right now https://debusine.debian.net/debian/developers/work-request/206210/ (screenshot) shows this root workflow as "COMPLETED/ERROR" but there's still the "DebDiff" sub-workflow marked as "PENDING".
In this situation, I believe that the scheduler detects the PENDING sub-workflows and tries to orchestrate it by re-running the orchestrator of the root workflow. But since the root workflow is COMPLETED, this breaks:
Oct 23 07:23:06 poseidon python3[1396465]: [2025-10-23 07:23:06,042: INFO/MainProcess] Task debusine.server.scheduler.schedule_task[289c4a01-a30b-4bba-9857-68e8c4ad79c5] received
Oct 23 07:23:06 poseidon python3[1396649]: [2025-10-23 07:23:06,783: ERROR/ForkPoolWorker-1] Task debusine.server.scheduler.schedule_task[289c4a01-a30b-4bba-9857-68e8c4ad79c5] raised unexpected: StatusChangeErr>
Oct 23 07:23:06 poseidon python3[1396649]: Traceback (most recent call last):
Oct 23 07:23:06 poseidon python3[1396649]: File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 453, in trace_task
Oct 23 07:23:06 poseidon python3[1396649]: R = retval = fun(*args, **kwargs)
Oct 23 07:23:06 poseidon python3[1396649]: ~~~^^^^^^^^^^^^^^^^^
Oct 23 07:23:06 poseidon python3[1396649]: File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 736, in __protected_call__
Oct 23 07:23:06 poseidon python3[1396649]: return self.run(*args, **kwargs)
Oct 23 07:23:06 poseidon python3[1396649]: ~~~~~~~~^^^^^^^^^^^^^^^^^
Oct 23 07:23:06 poseidon python3[1396649]: File "/usr/lib/python3/dist-packages/debusine/server/scheduler.py", line 428, in schedule_task
Oct 23 07:23:06 poseidon python3[1396649]: schedule()
Oct 23 07:23:06 poseidon python3[1396649]: ~~~~~~~~^^
Oct 23 07:23:06 poseidon python3[1396649]: File "/usr/lib/python3/dist-packages/debusine/server/scheduler.py", line 412, in schedule
Oct 23 07:23:06 poseidon python3[1396649]: result = list(schedule_internal())
Oct 23 07:23:06 poseidon python3[1396649]: ~~~~~~~~~~~~~~~~~^^
Oct 23 07:23:06 poseidon python3[1396649]: File "/usr/lib/python3.13/contextlib.py", line 85, in inner
Oct 23 07:23:06 poseidon python3[1396649]: return func(*args, **kwds)
Oct 23 07:23:06 poseidon python3[1396649]: File "/usr/lib/python3/dist-packages/debusine/server/scheduler.py", line 325, in schedule_internal
Oct 23 07:23:06 poseidon python3[1396649]: workflow_root.mark_running()
Oct 23 07:23:06 poseidon python3[1396649]: ~~~~~~~~~~~~~~~~~~~~~~~~~~^^
Oct 23 07:23:06 poseidon python3[1396649]: File "/usr/lib/python3/dist-packages/debusine/db/models/work_requests.py", line 2257, in mark_running
Oct 23 07:23:06 poseidon python3[1396649]: raise StatusChangeError(
Oct 23 07:23:06 poseidon python3[1396649]: ...<2 lines>...
Oct 23 07:23:06 poseidon python3[1396649]: )
Oct 23 07:23:06 poseidon python3[1396649]: debusine.db.models.work_requests.StatusChangeError: Cannot mark WorkRequest 206210 as running: current status is completed
Unfortunately that failure seems to stop all other scheduling work.
It's not exactly clear to me how we can get into this situation but it's likely that we have multiple such cases in our database right now. I found also https://debusine.debian.net/debian/developers/work-request/210950/ that was started recently by @stefanor.