Skip to content

update_workflows can deadlock

I noticed this in debusine.debian.net's Celery logs:

Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]: [2025-10-24 11:28:02,332: ERROR/ForkPoolWorker-13] Task debusine.server.celery.update_workflows[40
76a180-8d40-4363-b300-246a59622964] raised unexpected: OperationalError('deadlock detected\nDETAIL:  Process 3416289 waits for ShareLock on transaction 399621; blocked by proce
ss 3416288.\nProcess 3416288 waits for ShareLock on transaction 399625; blocked by process 3416289.\nHINT:  See server log for query details.\nCONTEXT:  while locking tuple (26
941,7) in relation "db_workrequest"\n')
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]: Traceback (most recent call last):
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:   File "/usr/lib/python3/dist-packages/django/db/backends/utils.py", line 89, in _execute
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     return self.cursor.execute(sql, params)
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:            ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]: psycopg2.errors.DeadlockDetected: deadlock detected
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]: DETAIL:  Process 3416289 waits for ShareLock on transaction 399621; blocked by process 3416288.
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]: Process 3416288 waits for ShareLock on transaction 399625; blocked by process 3416289.
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]: HINT:  See server log for query details.
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]: CONTEXT:  while locking tuple (26941,7) in relation "db_workrequest"
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]: The above exception was the direct cause of the following exception:
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]: Traceback (most recent call last):
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:   File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 453, in trace_task
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     R = retval = fun(*args, **kwargs)
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:                  ~~~^^^^^^^^^^^^^^^^^
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:   File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 736, in __protected_call__
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     return self.run(*args, **kwargs)
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:            ~~~~~~~~^^^^^^^^^^^^^^^^^
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:   File "/usr/lib/python3.13/contextlib.py", line 85, in inner
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     return func(*args, **kwds)
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:   File "/usr/lib/python3/dist-packages/debusine/server/celery.py", line 156, in update_workflows
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     for workflow in WorkRequest.objects.filter(
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:         id__in=workflow_ancestor_ids
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     ).select_for_update():
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     ~~~~~~~~~~~~~~~~~~~^^
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:   File "/usr/lib/python3/dist-packages/django/db/models/query.py", line 398, in __iter__
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     self._fetch_all()
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     ~~~~~~~~~~~~~~~^^
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:   File "/usr/lib/python3/dist-packages/django/db/models/query.py", line 1881, in _fetch_all
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     self._result_cache = list(self._iterable_class(self))
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:                          ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:   File "/usr/lib/python3/dist-packages/django/db/models/query.py", line 91, in __iter__
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     results = compiler.execute_sql(
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:         chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     )
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:   File "/usr/lib/python3/dist-packages/django/db/models/sql/compiler.py", line 1562, in execute_sql
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     cursor.execute(sql, params)
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     ~~~~~~~~~~~~~~^^^^^^^^^^^^^
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:   File "/usr/lib/python3/dist-packages/django/db/backends/utils.py", line 67, in execute
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     return self._execute_with_wrappers(
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:            ~~~~~~~~~~~~~~~~~~~~~~~~~~~^
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:         sql, params, many=False, executor=self._execute
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     )
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     ^
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:   File "/usr/lib/python3/dist-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     return executor(sql, params, many, context)
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:   File "/usr/lib/python3/dist-packages/django/db/backends/utils.py", line 84, in _execute
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     with self.db.wrap_database_errors:
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:   File "/usr/lib/python3/dist-packages/django/db/utils.py", line 91, in __exit__
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     raise dj_exc_value.with_traceback(traceback) from exc_value
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:   File "/usr/lib/python3/dist-packages/django/db/backends/utils.py", line 89, in _execute
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:     return self.cursor.execute(sql, params)
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]:            ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]: django.db.utils.OperationalError: deadlock detected
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]: DETAIL:  Process 3416289 waits for ShareLock on transaction 399621; blocked by process 3416288.
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]: Process 3416288 waits for ShareLock on transaction 399625; blocked by process 3416289.
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]: HINT:  See server log for query details.
Fri 2025-10-24 11:28:02 UTC poseidon debusine-server-celery.service[2920800]: CONTEXT:  while locking tuple (26941,7) in relation "db_workrequest"

This doesn't seem fatal because update_workflows is designed to catch up with all workflows that need an update, but it's noisy. I notice that update_workflows takes two locks in sequence, one for work requests that are directly marked as needing an update, and one that includes both those work requests and all their ancestors. That seems unnecessary, and I think carries the risk of a sequence such as the following, where A and B are work requests, A is the parent of B, and B has workflows_needs_update: True:

  • a workflow orchestrator takes an update lock on A
  • update_workflows takes an update lock on B
  • the workflow orchestrator above takes an update lock on B
  • update_workflows attempts to take an update lock on both A and B, and fails

This is a classic deadlock. If instead update_workflows were to take its lock on both A and B at the same time, there would be no problem because either our hypothetical workflow orchestrator or update_workflows itself would block until the other one's transaction ends.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information