Add a helper script to tee stdout/stderr and kill background processes
-
build: Have a separate directory for scripts to run in the testbed
I'm about to introduce another script that will have similar handling.
-
lib: Add a way to copy files to the testbed without changing ownership
When copying helper scripts to the testbed, we don't really want them to be owned by an unprivileged test user or world-writable: ideally they'll be root:root 0755.
-
Add a helper script to tee stdout/stderr and kill background processes
Previously, this was open-coded in the shell script that is created dynamically in Testbed.run_test (which should eventually all get absorbed into wrapper.sh, but that's a job for another MR).
The version that was open-coded in the shell script did not wait for the two tee subprocesses to run to completion before exiting. This meant that there is a race condition: if the test exits and cleans up the testbed too quickly, while the tee process is scheduled slowly, then it could result in test-case output still being buffered in the tee processes (and therefore lost) when they are killed by the testbed being torn down.
We can solve this by running the tee processes explicitly as subprocesses instead of using bash process substitution, and then waiting for them to exit.
However, a naive approach to this would result in the test hanging indefinitely if the test script launches background processes that inherit the test script's stdout/stderr, because if that happens, the tee processes will not exit until those background processes have all exited (or at least closed stdout/stderr), similar to LP #1488359. Arguably tests that do this are broken, but because Debian uses lxc and Ubuntu uses ssh in practice, and those both have a workaround for this, we do not know how many tests are affected. So, we need to locate and kill the background processes, similar to 6905b11d "adt-virt-lxc: In the auxverb, clean up leaked background processes which share the same stdout/stderr" and 3f685f54 "adt-virt-ssh: Kill dangling processes on ssh tty". Because the contents of the testbed are not under our control, we have to do this with one hand tied behind our backs, using only Essential functionality. I've used find(1) instead of ls(1), as suggested by shellcheck, which seems reasonably likely to be robust.
The virt-unshare backend is likely to be particularly susceptible to this, because it creates and destroys a pid namespace every time it runs an individual command in the testbed, only reusing the testbed's root filesystem. This means the command ends up as pid 1 in the namespace, and the tee commands are killed with SIGKILL by the kernel when their pid 1 exits, which happens a lot sooner than in other backends.
The virt-lxc, virt-lxd and virt-docker/virt-podman backends are much less susceptible to this, because they create a new container (set of namespaces) when the testbed is opened, with either an init system or a dummy 'sleep infinity' process as pid 1, and do not destroy it until the testbed is closed. They can still suffer from this race condition if the tee processes have not yet written all their output when the Testbed.run_test copies the -stdout and -stderr files out of the testbed to the host, but that seems to happen less often in practice.
Applying this technique to the dpkg-buildpackage invocation would be nice, and would probably allow the similar workaround at a higher level in virt-lxc, virt-lxd and virt-ssh to be removed, but is outside the scope of this MR, in which I'm just trying to make our tests pass.
One other thing that needs to be absorbed into wrapper.sh at the same time is the creation of /tmp/autopkgtest_script_pid. The scripts written to /tmp/autopkgtest-reboot and /tmp/autopkgtest-reboot-prepare rely on this being the immediate parent of the test itself, and autopkgtest's test suite will fail if this is not true.
Closes: #1017974