Problematic Scene
Two optimizations:
- CSE, Common subexpression elimination
- Hoisting, Loop-invariant code motion
When these two are performed together, the outcome can be problematic:
So we have to forbid one transfomation.
- C11 forbids load hoisting, allows CSE over lock().
- LLVM allows load hoisting, forbids CSE over lock().
A WMM should have unambiguous specification, be amenable to formal reasongning, but it’s not easy to get right.
<aside> 💡 WMM Desiderata
3. Basic Operational Semantics for Concurrency
Define program semantics with an abstract machine.
Define the model as a sequence of program transformations over some basic operational model, such as SC.
$$ \textsf{\textcolor{grey}{Example:} TSO = SC + WR-reordering + RaW-elimination} $$
But it’s applicable in very few cases and doesn’t work for ARM:
The ARMv8 architecture employs a weakly-ordered model of memory.
In general terms, this means that the order of memory accesses is not required to be the same as the program order for load and store operations. The processor is able to re-order memory read operations with respect to each other. Writes may also be re-ordered (for example, write combining).
As a result, hardware optimizations, such as the use of cache and write buffer, function in a way that improves the performance of the processor, which means that the required bandwidth between the processor and external memory can be reduced and the long latencies associated with such external memory accesses are hidden.
[ARM Cortex-A Series Programmer's Guide for ARMv8-A](https://developer.arm.com/documentation/den0024/a/Memory-Ordering#:~:text=The ARMv8 architecture employs a,with respect to each other.)