Re: Moving out of the kernel to improve performance?
You are right, that the extent of ring involvement is dependant on the OS (it is also dependent on the CPU arch actually). Both Linux and Windows use two rings for kernel and userspace, not sure about the others (I remember hearing that openBSD uses all 4 of the x86 rings, but no idea if that is true).
I will admit, I was looking at this a while ago, when I implemented RDMA over Firewire as a poor mans Infiniband for clustering, but back then it was not possible to access the hardware from userspace, without essentially writing a shim kernel module that would sit and pass the needed data between kernel and user space, and therefore having the overheads I mentioned.
Now, that is Linux specific, however any monolithic kernel design by its nature has to have all userspace stuff go through the kernel. GNU Hurd goes to show that it is possible to have user-space device drivers without the overhead, but the kernel has to be designed for it.
The MMU is a hardware device (nowadays integrated in the CPU die) which handles memory translation on the low level. Not only is it already low latency, both the kernel and the userspace use it (all the time), so there is no difference between user/kernel space in this context. The difference is that userspace goes though an additional layer, the VMM (Virtual memory manager), so each process sees its own virtual address space. Only the kernel (that doesn't use the VMM) sees the real address space, and lacking in the extra indirection that the userspace has to travel, has a lower latency.