Skip to content

Interrupt running work requests on workers when they have been aborted on the server side

As an extension to #384 (closed) we want to improve the server-worker cooperation so that when a work request is aborted on the server, the worker gets notified and immediately stops the execution of the work request (without reporting any result).

The idea is that aborting a work request that takes multiple hours to complete should free the worker immediately instead of letting it complete the work request to just discard the result afterwards.

This should be implemented for workers processing "worker tasks" at least. Signing workers can be handled too but it's less of an issue since tasks are relatively quick to execute.

Implementation plan

  • Introduce a new command in the server -> worker protocol used across the websocket connection
    • The command "ABORT WORK-REQUEST-ID" includes the work request ID to avoid a race condition where the worker just completed the aborted work request and picked a new one, so that we don't interrupt the new work request by mistake.

Open questions

  • Is it possible/desirable to do this for server tasks too?

(issue split off from #384 (closed))

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information