How to set up monitoring tools for Java application

In this blog post I’ll show how to set up two monitoring tools that are widely used for Java applications — Java Flight Recorder with JDK Mission Control and Prometheus with Grafana.

Project preparation

The entire code base that will be mentioned thru the article is available on GitHub — monitoring-sandbox. It contains a source code of a Java application and all configuration files for both monitoring tools.

Setting up Java Flight Recorder with JDK Mission Control

Let’s now move to more practical part of the post and have a look on JDK Flight Recorder (JFR) and JDK Mission Control (JMC).

> java -XX:StartFlightRecording:filename=/recordings/recording.jfr,dumponexit=true -jar app.jar
  • dumponexit - if in an application crashes or just stop we can specify if a recording will be dumped, or in other words if JFR should save a recording after stopping a program. In our case a value is true which means it will be persisted.
> docker-compose up -d jvm
> docker-compose down
> docker exec -it jvm-app shsh-4.4#
> docker exec -it jvm-app shsh-4.4# ls
app.jar bin boot dev etc home lib lib64 media mnt opt proc recordings root run sbin srv sys tmp usr var
sh-4.4# ls $JAVA_HOME/binjar        java   javadoc  jcmd      jdb        jdeps  jhsdb   jinfo  jmap  jpackage  jrunscript  jstack  jstatd   rmiregistry
jarsigner javac javap jconsole jdeprscan jfr jimage jlink jmod jps jshell jstat keytool serialver
sh-4.4# jcmd1 app.jar
183 jdk.jcmd/
sh-4.4# jcmd 1 JFR.check
Recording 1: name=1 maxsize=250.0MB (running)
sh-4.4# jcmd 1 JFR.dump filename=/recordings/short-recording.jfr
Dumped recording, 3.2 MB written to:
sh-4.4# jfr summary /recordings/short-recording.jfr Version: 2.1
Chunks: 1
Start: 2021-09-26 17:26:22 (UTC)
Duration: 1081 s
Event Type Count Size (bytes)
jdk.NativeMethodSample 53054 635109
jdk.JavaMonitorWait 51660 1393022
jdk.ThreadPark 2489 96828
jdk.GCPhaseParallel 2195 53536
jdk.CheckPoint 1184 833801
jdk.CPULoad 1079 21547
jdk.JavaThreadStatistics 1079 12915
jdk.ClassLoadingStatistics 1079 11836
jdk.CompilerStatistics 1079 31258
jdk.ExceptionStatistics 1079 15073
jdk.ModuleExport 946 10256
jdk.ObjectAllocationSample 780 11772
jdk.BooleanFlag 533 15762
jdk.ActiveSetting 355 10945
jdk.ThreadCPULoad 314 5311
sh-4.4# jfr print /recordings/short-recording.jfrjdk.ActiveSetting {
startTime = 17:26:22.702
id = 1185
name = "threshold"
value = "0 ns"
eventThread = "main" (javaThreadId = 1)
jdk.ActiveSetting {
startTime = 17:26:22.702
duration = 0.732 ms
id = 132
name = "period"
value = "beginChunk"
eventThread = "main" (javaThreadId = 1)
jdk.ActiveSetting {
startTime = 17:26:22.702
duration = 0.732 ms
id = 132
name = "enabled"
value = "true"
eventThread = "main" (javaThreadId = 1)
sh-4.4# jfr print --categories 'GC' /recordings/short-recording.jfrjdk.GCPhasePause {
startTime = 17:32:37.583
duration = 3.55 ms
gcId = 10
name = "GC Pause"
eventThread = "VM Thread" (osThreadId = 14)
jdk.GarbageCollection {
startTime = 17:32:37.583
duration = 3.55 ms
gcId = 10
name = "G1New"
cause = "Metadata GC Threshold"
sumOfPauses = 3.55 ms
longestPause = 3.55 ms
jdk.YoungGarbageCollection {
startTime = 17:32:37.583
duration = 3.55 ms
gcId = 10
tenuringThreshold = 15
jdk.G1GarbageCollection {
startTime = 17:32:37.583
duration = 3.55 ms
gcId = 10
type = "Concurrent Start"

Setting up JVM dashboards in Grafana

Previously discussed JFR is an amazing tool to get very in depth information about working JVM. But the tricky part is how to set it up so the information will be collected and stored in continuous way, without manually dumping data every time we would like to make an analysis. The solution to such inconvenience would be to make a use of something that’s called JFR Event Streaming, but probably it was introduced in Java 14 it’s still in very early stages and not many tools were built on top of that. Possible reason is that vast majority of an industry still sticks with the latest LTS version of Java, which is 11. With a recent release of Java 17, which is an LTS version, it will change soon as more and more companies will be migrating to it.

  • Prometheus — scrapes and stores metrics from applications,
  • Grafana — visualize stored metrics.
# HELP jvm_gc_memory_allocated_bytes_total Incremented for an increase in the size of the (young) heap memory pool after one GC to before the next
# TYPE jvm_gc_memory_allocated_bytes_total counter
jvm_gc_memory_allocated_bytes_total{application="jvm-app",} 6.291456E7
# HELP jvm_memory_max_bytes The maximum amount of memory in bytes that can be used for memory management
# TYPE jvm_memory_max_bytes gauge
jvm_memory_max_bytes{application="jvm-app",area="nonheap",id="CodeHeap 'non-nmethods'",} 5840896.0
jvm_memory_max_bytes{application="jvm-app",area="nonheap",id="Metaspace",} -1.0
jvm_memory_max_bytes{application="jvm-app",area="heap",id="G1 Old Gen",} 3.31350016E9
jvm_memory_max_bytes{application="jvm-app",area="nonheap",id="CodeHeap 'profiled nmethods'",} 1.22908672E8
jvm_memory_max_bytes{application="jvm-app",area="heap",id="G1 Survivor Space",} -1.0
jvm_memory_max_bytes{application="jvm-app",area="nonheap",id="Compressed Class Space",} 1.073741824E9
jvm_memory_max_bytes{application="jvm-app",area="heap",id="G1 Eden Space",} -1.0
jvm_memory_max_bytes{application="jvm-app",area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 1.22908672E8
│ ├───dashboards
│ └───provisioning
│ ├───dashboards
│ └───datasources
> docker compose up -d[+] Building 33.6s (14/14) FINISHED
... 0.0s
Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
[+] Running 4/4
- Network "monitoring-sandbox_default" Created 0.0s
- Container prometheus Started 1.2s
- Container jvm-app Started 1.2s
- Container grafana Started 3.3s


With this article I’ve showed how to set up two popular monitoring tools that can be used in your production system. They can offer lots of in-depth information what’s happening with your application and JVM. It can be especially helpful when something mysterious is starting to happen. Information about memory allocation, Garbage collection, CPU usage etc. can than help you find a root cause. But how to make sense from all these metrics? This is a story for another article, which will be published soon.

Java Software Developer, DevOps newbie, constant learner, podcast enthusiast.