Unprivileged containers on porterboxes
At the moment, Debian porterboxes provide chroot environments for developer access, using schroot. Developers with access to these machines can install arbitrary packages from the Debian archive using dd-schroot-cmd
, but cannot be root in a container: this is because schroot isn't really a container manager, so giving us root inside the chroot would be equivalent to giving us root on the real system. For many of the things that we might want to debug on a porterbox, it would be useful to be able to install modified packages without having to upload them to the real archive: sometimes it's possible to use tricks with LD_LIBRARY_PATH
, but that's not always feasible.
If we could run system-like containers using podman, then that would be sufficient to be able to install, modify and debug the majority of user-space packages: everything except the kernel, bootloaders, and container/namespace tools like podman itself (because running an unprivileged container inside an unprivileged container is usually not possible). Depending on configuration, a podman container can either behave like a better-isolated chroot with only a subtree of processes, or share only the kernel with the host system and run its own OS from the init system up (I've successfully used systemd and sysv-rc inside a podman container).
Unprivileged lxc has similar requirements and features. I'm suggesting podman specifically because lxc/lxd seem to be mostly only used within the Debian/Ubuntu bubble, which I think raises the risk of them becoming only minimally maintained by the same few oversubscribed people as other Debian infrastructure, much like schroot is now. podman is compatible with Docker and OCI container images, which are widely used outside Debian.
This would require the DSA team to be willing to set up:
-
/proc/sys/kernel/unprivileged_userns_clone
set to 1 (its Debian 11 default) instead of 0 (currently set by the sysadmins to harden these systems) -
/proc/sys/user/max_user_namespaces
set high enough (it is already, at least on barriere) - a block of 64K uids in /etc/subuid per developer, normally outside the usual 16-bit range
- for example, one good allocation scheme might be to give my uid 2912 access to uids (2912×65536) up to (2912×65536 + 65535) inclusive, using the top 16 bits as a "container ID" to identify the owner, and the low 16 bits as the uid used inside the container
- a similar block of 64K gids in /etc/subgid per developer
Alternatively, if we had access to /dev/kvm
, then we could use qemu/kvm virtual machines on architectures that support them.
There is some extra security exposure involved in this: there have been several kernel security vulnerabilities that could only be exploited by an attacker who is able to create new user namespaces, and Debian machines are currently protected from those. However, switching to a userns-based approach would also reduce security exposure in the long term by removing schroot (setuid root) and dd-schroot-cmd (I'm not sure how this one works, but presumably something involving setuid) from the trusted computing base.