Re: Writing parallel code doesn't have to be any harder than writing sequential code.
The difficulties lie in accessing the data structures, and also in efficient memory utilisation.
If you have a single stream of code execution, the processor itself causes an implied serialisation of access to data structures. Once you get more than one processor running, you then have to worry about making sure that two or more threads running simultaneously do not try to write to the same data structure.
You end up having to deal with spinlocks and other mechanisms that are completely unnecessary with single processor code execution. It's been this way ever since multiprocessor machines were available, and I worked on my first multiprocessor machine back in 1987. This challenge gets exponentially worse as the number of cores goes up. There are ways of managing this by separating the data into per-thread memory pools, but again it's something you just don't have to think about with single core machines.
The classic way of doing this has been to put an implied separation into the work that the system is doing. Things like not having multi-threaded processes so that there is no data contention at a process level, or, like the example you quote, having a state machine serialising access to common data structures. But when you start talking about true parallelisation, with multiple threads working on the same data set, these approaches don't work. HPC code writers have struggled with this problem for many years.
You also have the problem that modern multiprocessor machines are normally NUMA, which means that in order to get the best out of the machine, you have to have some idea of how to align memory to the CPUs executing the threads using the bulk of the data.
Both of these problems get much worse if you don't have any idea of the shape of the machine at the time you are writing the code.
What I read this approach as doing is to abstract the machine topology away from the hardware, and putting the complex parallelisation into the abstraction layer. If done correctly, this would allow the code writers to write for a single virtual machine shape without having to worry about the underlying hardware, much in the same way that a JVM allows writers to write code that appears processor neutral.