I've attached a description to each one which will provide some
guidance as to proceed with it. If you need more details, contact me
or send mail off to the uml-devel list.
My attempts to run the Debian install procedure hung UML while it was
probing disks. I never figured out what was happening.
If someone verifies that UML can run a Debian installation, I'll
consider this fixed.
Make UML build without warnings
I've been knocking these off one by one. The remaining ones are
mostly cries for reorganization. There are pieces of userspace code
in kernel files which call libc functions without there being
declarations in scope. Mostly, these things need to be separated from
whatever kernel code they're in, and moved to a userspace file.
An exception to this is the pair of warnings from kernel/timer.c. I'm
going to fix this by bumping UML's HZ to 50.
Remember the errno redefinition warning in user_syms.c during the deb build.
Make sure that each clock tick gets counted
If you run UML under load for a while, you'll notice that clock ticks
are being missed. You can see this by doing a 'ps uax' every once in
a while, and noticing that process start times are moving. I made an
attempt to fix this by never having SIGVTALRM or SIGALRM blocked and
doing some fancy flag checking to make sure that the IRQ handler is
only called if it's safe, and otherwise just bumping a count of missed
ticks. The next time it's safe, the IRQ handler will be called once
for the current tick and once for each previously missed tick,
catching the system up with the real time.
This doesn't fix it for some reason. The only reason I can think of
is that the signals are getting stacked because a second one comes in
before the first one has been collected by the handler. I have a hard
time believing that the host can be so bogged down that it can't call
the signal handler within 1/HZ seconds of the alarm happening. The
signal is going through the tracing thread courtesy of ptrace, so
maybe that's introducing enough delay to stack the signals.
Figure out the hostfs crash that Larent Bonnaud is seeing
Laurent hasn't complained about this recently. If he doesn't, I'll
just consider this fixed.
make 'gdb=pty' work
Run UML with 'debug gdb=pty' on the command line, attach to whatever
pty gdb gets, and you'll see the problem. There's some major problem
with the terminal modes that I've never figured out.
protect kernel memory from userspace
Fixing this will make it impossible for a nasty UML user to escape
from UML and execute arbitrary system calls on the host. It has four
Write-protect kernel physical and virtual memory whenever UML is in
userspace. The exception to this is the process kernel stack, which
needs to be writable in order for a signal handler to be run on it.
This is OK because whatever the process puts there will just be
overwritten by the signal handler. This is partially implemented -
kernel physical memory is protected, except for the four pages making
up the task structure and stack. Protecting the task structure is
complicated because there is some UML code which effectively runs in
userspace which calls the process signal handler to deliver signals.
It needs to be able to write to the task structure. I think this fix
is to move that code to before the signal starts being delivered,
which will involve redoing parts of the signal delivery code.
Protecting kernel virtual memory is a matter of walking the kernel
page tables and write-protecting everything that's mapped there.
Locating and disabling all drivers that provide access to kernel
memory through interfaces such as /proc/kcore.
Fixing the access_ok macro so it rejects any attempt to fake the
kernel into changing its own memory by passing a kernel address as an
argument to a system call.
In the host kernel, adding a new personality which segfaults any
attempt to execute a SysV system call. These are not traced by
ptrace, so a process that knows how to use them is not jailed inside UML.
Figure out why the io_thread loses its parent and its connection to the kernel
This happened once, a long time ago. The system hung because the IO
thread lost its connection to UML somehow. I haven't seen it since,
so this will probably be considered fixed.
Get either Myrtal Meep or Matt Clay
to confirm (or not) that they can no longer crash UML with
Apache/Perl/MySQL. This will probably be fixed with the new network drivers.
The new network drivers probably fixed these problems. This will be
considered fixed unless people start complaining about it again.
Disable SIGIO on file descriptors which are already being handled.
This will cut off some interrupt recursion.
This is an efficiency thing and stack depth precaution. We don't want
sigio_handler to run recursively on the same descriptor because the
descriptor will be handled at the top level.
Figure out why gdb can't use fd chan (email@example.com)
One of the ioctls that sets up the terminal for gdb returns EPERM.
Figure out why repeated 'tail /dev/zero' with swap causes process segfaults
I stopped being able to reproduce this, so it might be considered
mysteriously fixed at some point.
Set SA_RESTART on SIGPROF
This will prevent SIGPROF from causing system calls on the host from
returning EINTR and messing things up inside UML.
Allow registration of sockets for panics and console output
This involves some work on the mconsole client side to support making
the requests and work in UML to add the mconsole notification to the
panic_notify_chain and to somehow get console output to the client.
My best idea on doing this is to add support for multiple channels on
consoles and support for write-only channels. Then the mconsole
client can stack a write-only channel to a socket or something on top
of whatever channel is already there.
Replace dentry_name with d_name
Cleanup - I implemented dentry_name, not knowing about d_name, which
does the same thing. One problem might be that dentry_name expects
the file to exist, which might cause something to break when d_name
returns a '/foo/bar/baz (deleted)' name.
Bind a UML unix socket to a host unix socket
This is a cute feature which would involve passing something like
'hostsock=/tmp/.X11-unix/X0,X' on the command line, which would cause
the hostsock driver to create a /proc/hostsock/X unix socket inside
UML and pass all operations and data on it through to the host's
/tmp/.X11-unix/X0. Then, you'd create /tmp/.X11-unix/X0 in UML as a
link to /proc/hostsock/X and be able to run X clients on the host X
server without needing the network up.
Dynamically allocate all driver descriptors
The static arrays in the drivers should go away and be replaced by
dynamically allocated buffers, maybe in a list. This will turn
lookups of particular devices into O(N) operations, but maybe that's
OK. Also, the transport arrays in the various transports should
similarly go away.
Make slip_tramp conditional in slip_close
slip_open doesn't always invoke the helper through slip_tramp, so
slip_close similarly shouldn't.
The real problem here is that when a gateway address isn't specified for
slip, it just doesn't work at all.
many du / causes slab cache corruption
This wasn't entirely reproducable, but I did see a couple cases of
panics with slab corruption when running a lot of du's. I fixed the
process segfault problem, and this problem has probably disappeared as
Adding an eth device via the mconsole (and probably the command line)
won't necessarily configure the device that was named
The kernel does its own ordering of devices, so configuring eth1 without
configuring eth0 gives you eth0.
This would let UML be configured on the inside with knowledge of what
instance it is.
Fix the ubd rounding bug spotted by Steve Schmidtke and Roman Zippel.
If the number of sectors isn't a multiple of 8, then the bitmap will
be a a byte shorter than it should be.
gdb should be the last thing to be shut down so that breakpoints can
be set late in the shutdown process.
Have the __uml_exitcall handlers check to see if they need to do anything.
If UML is ^C-ed before things are set up, then the cleanup handlers are
cleaning up after things which were never initialized.
Find the race that causes the kernel to run before the debugger is running,
and which then causes a panic and segfault.
Tests that need writing
Build and load some modules to check for unexported symbols
Swap testing - qsbench, low-memory kernel build
Rerun some existing tests through hostfs
Figure out why gdb inside UML sometimes misses breakpoints.
^C doesn't work well in a busy loop in __initcall code. I've seen the
process die with a SIGINT as well as have the SIGINT arrive to a
sleeping process, panicing the tracing thread.
When setting a conditional breakpoint on __free_pages in
free_all_bootmem_core and continuing, UML hangs.
It's not a hang. The conditional breakpoint is in a long loop, and that
makes it take a while. ^C to gdb seems to be ignored for some reason. UML
sees it and passes it right along to gdb, which does nothing about it.
Figure out what to do with __uml_setup in modules. A lot of these should
end up being normal __setup, since they don't have to happen before the
Figure out why UML died nastily after ^C-ing it and hitting it in userspace.
Telnetting to a console when in.telnetd is non-executable produces an
error and a crash from a sleeping process segfaulting.
Single-stepping when a signal has arrived doesn't work. gdb
singlesteps, sees the signal, and single-steps with the signal. The
signal is handled immediately, stepping the process into the handler.
The original instruction was never executed, so when gdb puts the
breakpoint back and the handler returns, the breakpoint is hit again.
The _to_user routines refer to user data without using copy_to_user.
How to cause a strange userspace segfault - run the new UML inside UML
under gdb. Put a breakpoint on create_elf_tables. 'finish', 'next',
With an old port-helper hanging on to a port, running a UML which want those
ports causes all of the consoles and serial lines to output their login
prompts to stdout.
Assigning console and serial line devices to files doesn't work cleanly.
UML tries to register the descriptors for SIGIO. Some other mechanism is
needed for feeding the file data into userspace.
Make sure that irqs are deactivated or freed when their descriptors are
Things to audit:
Make sure that destructors exactly undo everything the constructor does
Set FD_CLOEXEC on any descriptors that don't need to be passed across execs.
Make sure any protocols are 64-bit clean - this means not using ints, longs,
etc. Also, maybe enums are bad.
port_kern.c has port_remove_dev and port_kern_free which do almost the
Make sure that return values are actually used.
skas4 things - remove the extra two context switches per system call,
make sure it compiles with CONFIG_PROC_MM off, implement the mm
indirector, move PTRACE_LDT to /dev/mm, fix PTRACE_FAULTINFO to return
trap type and err, allow the environment, argument, etc pointers of an
mm to be set.