debusine.server.tests.test_consumers tests are flaky (<asyncio.locks.Lock object at 0xxxxxxxxxxxxx [locked]> is bound to a different event loop)
The following tests are failing frequently and intermittently:
- debusine.server.tests.test_consumers.WorkRequestCompletedConsumerTests.test_work_request_completed_selected_workspace
- debusine.server.tests.test_consumers.WorkRequestCompletedConsumerTests.test_work_request_completed_workspace_not_monitored
- debusine.server.tests.test_consumers.WorkerConsumerTransactionTests.test_disconnect_leaves_work_request_as_running
- debusine.server.tests.test_consumers.WorkerConsumerTransactionTests.test_request_dynamic_metadata_after_connect
- debusine.server.tests.test_consumers.WorkerConsumerTransactionTests.test_work_request_assigned
- debusine.server.tests.test_consumers.WorkerConsumerTransactionTests.test_worker_disabled
With the same error:
RuntimeError: <asyncio.locks.Lock object at 0x7f8b5240e250 [locked]> is bound to a different event loop
test_work_request_completed_selected_workspace
and test_work_request_completed_workspace_not_monitored
seem to be the root and cause the rest of the tests to fail similarly.
This looks like a testing-only error, because it happens when shutting down all the asyncio stack, and restarting it again for the next test. Non-test reports welcome.
Recap 2024-02:
- Switching to in-memory (non-redis) channel layer appears to fix the issue, hinting at an issue with channels-redis; however doing so is not recommended for production, and probably will silently break IPC between the server and the CLI;
-
Dropping
sync_to_async(async_to_sync(...))
code paths appears to fix the issue; it relies on threads and hints at threading issue, although this should be supported (asgiref knows how to get back to the original thread in this scenario); - python3-channels-redis >= 4.1 improves but does not fix the situation, hinting at multiple causes for this issue;
- A couple clean-ups in !389 (merged) and !425 (merged) improve but do not fully fix the situation;
- A minimal PoC attempt https://salsa.debian.org/freexian-team/debusine/uploads/ac7c0d486e559725cf2b22ccaed762a4/mysite.zip doesn't reproduce the issue so far.
- !594 (merged) improves but does not fully fix the situation;
- !928 (merged) may have "fixed" the issue
Sample failures:
- 2023-12-08 https://salsa.debian.org/jspricke/debusine/-/jobs/5013993 (autopkgtest)
- 2023-12-08 https://salsa.debian.org/beuc/debusine/-/jobs/5014974 (unit-tests)
- 2024-01-05 https://salsa.debian.org/beuc/debusine/-/jobs/5120793 (autopkgtest-trixie)
- 2024-01-10 https://salsa.debian.org/beuc/debusine/-/jobs/5139916 (unit-tests)
- 2024-01-16 https://salsa.debian.org/freexian-team/debusine/-/jobs/5166884 (unit-tests-pip)
- 2024-04-10 https://salsa.debian.org/beuc/debusine/-/jobs/5571768 (autopkgtest)
- 2024-04-16 https://salsa.debian.org/freexian-team/debusine/-/jobs/5597806 (autopkgtest)
Edited by Sylvain Beucler