JVM Architecture

- 10 mins

Java

Java source codes are compiled into an intermediate state called bytecode (i.e. .class file) using the Java compiler (javac). The Java Virtual Machine a.k.a JVM interprets the bytecode (without further recompilations) into native machine language. Therefore, bytecode acts as a platform-independent intermediary state which is portable among any JVM regardless of underlying OS and hardware architecture.

The JVM is a specification. Vendors are free to customize, innovate, and improve its performance during the implementation.

JVM Architecture

Java Virtual Machine Archirecture

1. Class Loader Subsystem

The JVM resides on the RAM. During execution, using the Class Loader subsystem, the class files are brought on to the RAM. This is called Java’s dynamic class loading functionality. It loads, links, and initializes the class file (.class) when it refers to a class for the first time at runtime (not compile time).

1.1. Loading

Java Class Loaders

Note: It is possible to directly create a User-defined Class Loader on the code itself.

1.2. Linking

Linking is to verify and prepare a loaded class or interface, its direct superclasses and superinterfaces, and its element type as necessary, while following the below properties:

1.3. Initialization

The initialization logic of each loaded class or interface will be executed (e.g. calling the constructor of a class). Since JVM is multi-threaded, initialization of a class or interface should happen very carefully (i.e. make it thread safe).

2. Runtime Data Areas

Runtime Data Areas are the memory areas assigned when the JVM program runs on the OS.
In addition to reading .class files, the Class Loader subsystem generates corresponding binary data and save the following information in the Method area for each class separately:

For every loaded .class file, it creates exactly one Class object to represent the file in the Heap memory. This Class object can be used to read class level information (class name, parent name, methods, variable information, static variables etc.) later in the code.

2.1 Method Area (Shared)

This is a shared resource (only 1 method area per JVM). All JVM threads share this same method area, which means the access to the method data and the process of dynamic linking must be thread safe.
Method area stores class level data (including static variables) such as:

2.2 Heap Area (Shared)

This is also a shared resource (only 1 heap area per JVM). Information of all objects and their corresponding instance variables and arrays are stored in the Heap area. Heap area is a great target for GC.

2.3. Stack Area (Per thread)

This is not a shared resource (thread safe). Every JVM thread has a separate runtime stack to storemethod calls. For every such method call, one entry will be created and added (pushed) into the top of runtime stack and such entry it is called a Stack Frame.

JVM Stack Configuration

A Stack Frame is divided into three sub-entities:

The frame is removed (popped) when the method returns normally or if an uncaught exception is thrown during the method invocation. Since these are runtime stack frames, after a thread terminates, its stack frame will also be destroyed by JVM.

The stack frame is size fixed, however, the stack itself can be a dynamic or fixed size. If a thread requires a larger stack than allowed a StackOverflowError is thrown. If a thread requires a new frame and there isn’t enough memory to allocate it then an OutOfMemoryError is thrown.

2.4. PC Registers (Per thread)

For each JVM thread, when the thread starts, a separate PC (Program Counter) Register gets created in order to hold the address of currently-executing instruction (memory address in the method area). If the current method is native then the PC is undefined. Once the execution finishes, the PC register gets updated with the address of next instruction.

2.5. Native Method Stack (Per thread)

There is a direct mapping between a Java thread and a native operating system thread. After preparing all the state for a Java thread, a separate native stack also gets created in order to store native method information invoked through JNI (Java Native Interface).

Once the native thread has been created and initialized, it invokes the run() method in the Java thread. When the thread terminates, all resources for both the native and Java threads are released. The native thread is reclaimed once the Java thread terminates. The operating system is therefore responsible for scheduling all threads and dispatching them to any available CPU.

3. Execution Engine

Execution Engine executes the instructions in the bytecode line-by-line by reading the data assigned to Runtime Data Areas.

3.1. Interpreter

The interpreter interprets the bytecode and executes the instructions one-by-one. Hence, it can interpret one bytecode line quickly, but executing the interpreted result is a slower task. The disadvantage is that when one method is called multiple times, each time a new interpretation and a slower execution are required.

3.2. Just-In-Time (JIT) Compiler

The JIT compiler, compiles the bytecode to native code. Then for repeated method calls, it directly provides the native code.

However, even for JIT compiler, it takes more time for compiling than for the interpreter to interpret. For a code segment that executes just once, it is better to interpret it instead of compiling. Also the native code is stored in the cache, which is an expensive resource. With these circumstances, JIT compiler internally checks the frequency of each method call and decides to compile each only when the selected method has occurred more than a certain level of times. This idea of adaptive compiling has been used in Oracle Hotspot VMs.

Execution Engine qualifies to become a key subsystem when introducing performance optimizations by JVM vendors. Among such efforts, the following 4 components can largely improve its performance:

3.3. Garbage Collector

As long as an object is being referenced, the JVM considers it alive. Once an object is no longer referenced and therefore is not reachable by the application code, the garbage collector removes it and reclaims the unused memory.

4. Java Native Interface (JNI)

This interface is used to interact with Native Method Libraries. This enables JVM to call C/C++ libraries and to be called by C/C++ libraries which may be specific to hardware.

5. Native Method Libraries

This is a collection of C/C++ Native Libraries which is required for the Execution Engine and can be accessed through the provided Native Interface.

6. JVM Threads

The JVM concurrently runs multiple threads, some of these threads carry the programming logic and are created by the program (application threads), while the rest is created by JVM itself to undertake background tasks in the system (system threads).

The major application thread is the main thread which is created as part of invoking public static void main(String[]) and all other application threads are created by this main thread. Application threads perform tasks such as executing instructions starting with main() method, creating objects in Heap area if it finds new keyword in any method logic etc.

The major system threads are as follows:

7. Conclusion

Java is considered as both compiled (high-level java code into bytecode) and interpreted (bytecode into native machine code). By design, Java is slow due to dynamic linking and run-time interpreting, however, JIT compiler compensate for the disadvantages of the interpreter for repeating operations by keeping a native code instead of bytecode.

8. Useful Commands

9. Sources

Mouaad Aallam

Mouaad Aallam

Software Engineer

rss facebook twitter github youtube mail spotify instagram linkedin google pinterest medium vimeo mastodon gitlab docker