Java Memory Model
- 13 mins1. The Problem
In Java, a program code can change a lot between its Java source code form, Byte code form and machine code form. The Java source code is supposed to be focused more on readability and clarity, while the machine code is focused more on performance and efficiency. Thats why the JVM is allowed to optimise the code, with different degrees of optimization (depending on the compilation stage), as long as it remains correct. However, this task can be very handy especially in the context of multi-threaded applications.
1.1 Sequential Consistency
Roughly, while executing a program there is multiple levels of caching. The processor never operate on values directly in the main memory, but instead it loads the values to its own cache, manipulates them, then write them back to the main memory. Lets take the following example:
class Reorder {
int foo = 0;
int bar = 0;
void calc() {
foo += 1; //#1
bar += 1; //#2
foo += 2; //#3
}
}
How the processor can execute the method calc()
in memory ?
- Load
foo
from main memory to processor cache. Increment by 1, write it back the main memory (#1
). - Load
bar
from main memory to processor cache. Increment by 1, write it back the main memory (#2
). - Load
foo
from main memory to processor cache. Increment by 2, write it back the main memory (#3
).
How the previous example can be optimized ? A simple approach: swipe the instructions (#2
and #3
):
void calc () {
foo += 1; //#1
foo += 2; //#3
bar += 1; //#2
}
- Load
foo
from main memory to processor cache. Increment by 1, Increment by 2, write it back the main memory (#1
and#3
). - Load
bar
from main memory to processor cache. Increment by 1, write it back the main memory (#2
).
In a single threaded program, this optimisation can be considered without side-effects, however in a multi-threaded world, it introduces some abnormal behavior:
The possible values of the variables overtime in the two cases shows the slight difference:
- Before optimisation:
- (foo == 0, bar == 0)
- (foo == 1, bar == 0)
- (foo == 1, bar == 1)
- (foo == 3, bar == 1)
- After optimisation:
- (foo == 0, bar == 0)
- (foo == 1, bar == 0)
- (foo == 3, bar == 0)
- (foo == 3, bar == 1)
This previous example is an optimisation that the JVM is allowed to do. The JVM can do much more complex optimisations, however the outcome might be unexpected the a multi-threaded world! But why optimise then ? The answer: memory access latency!
1.2 Eventual Consistency
A machine can have multi processors, and (at some level) each processors have its own cache, which means, each processor loads only the values it needs for it operations.
Lets we have two processors and the following program:
class Caching {
boolean flag = true;
int count = 0;
void thread1() {
while (flag) count++;
}
void thread2() {
flag = false;
}
}
Lets say processor #1
will run the method thread1()
and processor #2
will run the method thread2()
. An optimisation can be the following:
- Since
thread1()
never modifies theflag
variable, no need to load it from the main memory for each loop check, only once to the cache is enough -> the changes toflag
might never been observed! - Processor
#2
has no obligation to write it changes to theflag
variable to the main memory! Which means an optimization can be to simply not do the operation at all!
1.3 Atomicity
The atomicity in Java is to consider all values are atomic, which means that the modification to a variable (for example: 64 bit types like long
and double
) to be done atomically.
Lets consider the following example:
class LongTearing {
long foo = 0L;
void thread1() {
foo = 0x0000FFFF; // 2147483647
}
void thread2() {
foo = 0xFFFF0000; // -2147483648
}
}
A 64 bit long
variable, is written in two slots in the case of 32 memory, a problem can occur here:
-
thread1()
writes the first half of its value to memory0000
. -
thread2()
writes the second half of its value to memory0000
. -
thread1()
writes the second half of its value to memoryFFFF
. -
thread1()
writes the first half of its value to memoryFFFF
. - The final value of the variable will be then:
0xFFFFFFFF
!!!
1.4 Processor Optimization
Ordering operations sometimes are tied to the processor architecture. Optimisation needs can be different for example between ARM processors and x86 processors. ARM processors can be more aggressive because they are designed for energy consuming efficiency, than x86 processors which are more about calculation speed.
2. What is Java Memory Model ?
Java memory model answers the question: what values can be observed upon reading from a specific field ?
Formally specified by breaking down a Java program into actions and applying several orderings to these actions. If one can derive a so-called happens-before ordering between write actions and a read actions of one field, the Java memory model guarantees that the read returns a particular value.
The Java memory machine guarantees intra-thread consistency equivalent to sequential consistency.
2.1 Building blocks
According the Java memory model, using the following keywords, a programmer can indicate to the JVM to refrain from optimizations that could otherwise cause concurrency issues:
- Field-scoped:
final
,volatile
. - Method-scoped:
synchronized
(method/block),java.util.concurrent .*
.
In terms of the Java memory model, the above concepts introduce additional synchronization actions which introduce additional (partial) orders. Without such modifiers, reads and writes might not be ordered what results in a data race.
A memory model is a trade-off between a language’s simplicity (consistency/atomicity) and its performance.
2.2 Volatile
Lets take the following example:
class DataRace {
boolean ready = false;
int answer = 0;
void thread1() {
while (!ready);
assert answer == 42;
}
void thread2() {
answer = 42; // #1
ready = true; // #2
}
}
The lines #1
and #2
can be reordered! Which means, the assertion in method thread1()
can actually fail in a multi-threaded world!
A solution ? The keyword volatile
:
class DataRace {
volatile boolean ready = false;
int answer = 0;
void thread1() {
while (!ready);
assert answer == 42;
}
void thread2() {
answer = 42; // #1
ready = true; // #2
}
}
volatile
implies for two threads with a write-read relationship on the *same field*, certain optimisations are not allowed!
- When a thread writes to a
volatile
variable, all of its previous writes are guaranteed to be visible to another thread when that thread is reading the same value. - Both threads must align “their”
volatile
value with that in main memory (flush). - If the
volatile
value was along
or adouble
value, word-tearing was forbidden.
2.3 Synchronized
Another way to achieve the synchronization is by using: synchronized
Lets check the following example assuming the second thread acquires the lock first:
class DataRace {
boolean ready = false;
int answer = 0;
synchronized void thread1() {
while (!ready);
assert answer == 42;
}
synchronized void thread2() { //Assuming this is called 1st
answer = 42;
ready = true;
}
}
When a thread releases a monitor, all of its previous writes are guaranteed to be visible to another thread after that thread is locking the same monitor.. This only applies for two threads with an *unlock-lock relationship* on the same monitor!
2.4 Thread life-cycle semantics
When a thread starts another thread, the started thread is guaranteed to see all values that were set by the starting thread.
class ThreadLifeCycle {
int foo = 0;
void method() {
foo = 42;
new Thread() {
@Override public void run() {
assert foo == 42;
}
}.start();
}
}
Similarly, a thread that joins another thread is guaranteed to see all values that were set by the joined thread.
2.5 Final field semantics
When a thread creates an instance, the instance’s final
fields are frozen. The Java memory model requires a field’s initial value to be visible in the initialized form to other threads.
This requirement also holds for properties that are dereferenced via a final
field, even if the field value’s properties are not final themselves (memory-chain order).
2.6 External actions
A JIT-compiler cannot determine the side-effects of a native operation. Therefore, external actions are guaranteed to not be reordered.
class Externalization {
int foo = 0;
void method() {
foo = 42;
jni(); // Not re-ordered
}
native void jni();
}
External actions include JNI, socket communication, file system operations or interaction with the console (non-exclusive list).
2.7 Thread-divergence actions
Thread-divergence actions are guaranteed to not be reordered. This prevents surprising outcomes of actions that might never be reached.
class ThreadDivergence {
int foo = 42;
void thread1() {
while (true);
foo = 0; // Not re-ordered
}
void thread2() {
assert foo == 42;
}
}
In the previous example, in the method thread1()
the line foo = 0
is unreachable. Thus not re-ordered.
3. In Practice
The following are some practical examples of Java Memory Model use (or misuse).
3.1 Double checking
The following is a lazy instance creation example:
class DoubleChecked {
static volatile DoubleChecked instance;
static DoubleChecked getInstance() {
if (instance == null) {
synchronized (DoubleChecked.class) {
if (instance == null) {
instance = new DoubleChecked();
}
}
return instance;
}
int foo = 0;
DoubleChecked() { foo = 42; }
void method() { assert foo == 42; }
}
This example works because of volatile
, omitting it may cause having an instance of an objected created, but uninitialized!
3.2 Arrays
Declaring an array to be volatile
does not make its elements volatile
! In the following example, there is no write-read edge because the array is only read by any thread:
class DataRace {
volatile boolean[] ready = new boolean[] { false };
int answer = 0;
void thread1() {
while (!ready[0]);
assert answer == 42;
}
void thread2() {
answer = 42; ready[0] = true;
}
}
For such volatile element access: java.util.concurrent.atomic.AtomicIntegerArray.