The point of a wel-defined / modern Memory Model is...
1) You can implement full-userland concurrency (locks, semaphores, atomic objects etc.), with "high-level" code in the given language (no recourse to ASM), and no costly context switch to use OS concurrency facilities unless really necessary (inter-process sync).
2) You can do very dangerous and complex, but very fast concurrency algorithms. Code that is tolerant to controlled data races, lock-free concurrent data structures, etc. See for example Java's Disruptor, or even the internals of many java.util.concurrent APIs. Notice that at this level, the Memory Model is not useful to most language end-users (like I say it's really advanced, highly complex programming that's hard to do right with a good and well-defined MM - but impossible without such MM). It's a feature target at experts who write low-level libraries. Most programmers are better off using higher-level things like volatile variables and concurrency APIs.
3) Even for high-level code, the MM makes easier to have compiler optimizations that are powerful and portable because they can happen on the HIR level (dealing with happens-before or similar attributes) and not in the code generation phase (where every CPU is different); each CPU back-end only needs to provide its own lowering of memory model pseudo-instructions. So this is good for multiplatform C++ compilers like GCC and CLANG. Java VMs have long relied on the MM in order to do all sorts of interesting optimizations, such as lock elision (lock/unlock operations are simply discarded when the JIT can prove it's safe... and yeah this happens VERY often). Even much before 2005, JVMs did many dirty concurrency optimizations, it was just not standardized so you could write advanced code that relied on well-defined behavior of races, but it would not be portable.