How IBM does it.
For some ideas on how MS may end up handling some of the issues, here is how IBM does it.
On the RS6K (Pseries) side of things, the hypervisor is an important key part and the OS needs to be capable of handling the changes the hypervisor makes. I think on the x86 side this will end up being the case as well.
Partitions set up with the AIX operating system support dynamic allocation of RAM (add, subtract, move). Last I saw Linux on POWER could not handle dynamic allocation of RAM.
Resources (adapter or processor) can be added, moved or removed dynamically to both AIX and Linux. The target device (for adapters) needs to be removed from the running configuration inside the OS first before you can tell the hypervisor to move the device.
Processors can be added or subtracted with some loss of performance (you now either lost some cached data or added an empty cache to fill) and extra context switching depending on what you do.
This is all at the dedicated level. When you get into logical partitions and virtual devices it gets even more fun, but all the same stuff can be done.
Dynamic reallocation of resources is great because you do not need to bounce the running system to change things.
So Windows will have to support the ability to add/remove/move memory/processors/devices (even when virtualized) and there will need to be a decent hypervisor behind it all
As for memory fault tolerance
On it’s Xseries (x86) hardware they like to use chipkill RAM. It will be able to handle failure of a single memory module and keep on trucking. Each mem chip has an extra 8 bit chip and it acts as sort of a parity/raid in RAM.