Your missing the other big thing from what LANL and EMC have done with the new subsystem. Before, all computation had to be done and completed, then the nodes would then do the modeling (pretty video). Now, with the system in place, they can do computation and modeling at the same time, thus greatly reducing the time to run the "experiment"
The checkpoints now also occur at a much greater frequency than 4 hours, thus reducing how much time they lose when a node fail