Diagnosing Java Native Memory Issues

Symptoms

  • The Jenkins java process is being oom-killed
  • top analysis shows unusual memory consumption outside of heap memory and metaspace memory

Environment

Diagnosis/Treatment

  • Pre-conditions:
  • You have already confirmed expected heap utilization is normal through garbage collection log analysis.
  • You have already confirmed expected metaspace memory consumption.
  • You are running a supported JDK which includes jcmd
  • You are engaged with CloudBees Support via an exisiting support case

Summary

This article describes how to enable Java native memory tracking to trace memory issues that may lie outside of the JVM Heap and Metaspace, as illustrated in the following picture:

nmt_mem.png

Configure NMT Tracking

Step 1
First, you will need to add the JVM argument: -XX:NativeMemoryTracking=summary by following the instructions outlined here: How to add Java Arguments to Jenkins

In order for the new argument to take affect, the Jenkins java process must be restarted. It is recommended to do this during a scheduled maintenance window.

The Oracle documentation on native memory troubleshooting can be found here: https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html

Step 2

To gather a summary, the following jcmd command can be ran:

jcmd <pid> VM.native_memory summary

This will output a summary log similar to the following:

Native Memory Tracking:

Total: reserved=9769936KB +130080KB, committed=8644184KB +212676KB

-                 Java Heap (reserved=7340032KB, committed=7340032KB)
                            (mmap: reserved=7340032KB, committed=7340032KB)

-                     Class (reserved=1279073KB +47364KB, committed=262493KB +52484KB)
                            (classes #36038 +5114)
                            (malloc=9313KB +2308KB #183403 +54470)
                            (mmap: reserved=1269760KB +45056KB, committed=253180KB +50176KB)

-                    Thread (reserved=267357KB -41298KB, committed=267357KB -41298KB)
                            (thread #267 -40)
                            (stack: reserved=266176KB -41120KB, committed=266176KB -41120KB)
                            (malloc=870KB -132KB #1338 -200)
                            (arena=312KB -47 #529 -80)

-                      Code (reserved=278652KB +13258KB, committed=183820KB +90738KB)
                            (malloc=29052KB +13258KB #30409 +10291)
                            (mmap: reserved=249600KB, committed=154768KB +77480KB)

-                        GC (reserved=371875KB +18969KB, committed=371875KB +18969KB)
                            (malloc=66723KB +18969KB #364357 +127871)
                            (mmap: reserved=305152KB, committed=305152KB)

-                  Compiler (reserved=768KB -205KB, committed=768KB -205KB)
                            (malloc=637KB -205KB #3163 +746)
                            (arena=131KB #6)

-                  Internal (reserved=164392KB +103183KB, committed=164388KB +103179KB)
                            (malloc=164356KB +103179KB #69335 +8065)
                            (mmap: reserved=36KB +4KB, committed=32KB)

-                    Symbol (reserved=36296KB +3974KB, committed=36296KB +3974KB)
                            (malloc=31877KB +3623KB #359749 +42849)
                            (arena=4419KB +352 #1)

-    Native Memory Tracking (reserved=15871KB +3803KB, committed=15871KB +3803KB)
                            (malloc=43KB #500 +5)
                            (tracking overhead=15828KB +3804KB)

-               Arena Chunk (reserved=1284KB -18968KB, committed=1284KB -18968KB)
                            (malloc=1284KB -18968KB)

-                   Unknown (reserved=14336KB, committed=0KB)
                            (mmap: reserved=14336KB, committed=0KB)

Note that there are 11 different areas of memory consumption from the JVM. Most notably, “Java Heap” is the amount of heap space allocated to the JVM. Also, “Class” can be traced to metaspace, as this is where class metadata is stored. The other 9 areas should hover around 10-250MB respectively. When we see areas of native memory above 1GB it is considered abnormal.

Create NMT summaries

Now that you have NMT Summarization enabled, running the aforementioned jcmd command hourly should allow you to review any abnormalities of native memory. This can be achieved programtically, either by running a simple script external to Jenkins, or by creating a Jenkins job from within the application. We will provide both options below. Please keep in mind that collecting NMT data has roughly a 10% resource overhead cost to the JVM.

1.) Collecting the data external to Jenkins via Bash script

The following example will collect the data hourly when called from a cron trigger on the host OS.

#!/bin/bash
TSTAMP="$(date +'%Y%m%d_%H%M%S')"
jenkinsPid="$(pgrep -o java)"
nmtLog="$JENKINS_HOME/support/nmt.log"
echo $TSTAMP $JENKINS_CLUSTER_ID >> $nmtLog

jcmd $jenkinsPid VM.native_memory summary >> $nmtLog

Save this bash script as a file named nmtlogging.sh in $JENKINS_HOME/support

Add the script to your crontab by running the following command:

crontab -e

Append the following entry:

0 * * * * JENKINS_HOME='/path/to/jenkins-home' $JENKINS_HOME/support/nmtlogging.sh

Save and close the file.

2.) Collecting the data via a Jenkins job

Creating a new Freestyle job with a ‘Build periodically’ trigger on the Jenkins master where you have enabled NMT Summary tracking will allow you to run the script mentioned above on an hourly basis as highlighted in the following screenshots:

freestyle_nmt.png

nmt_restrict1.png

nmt_cron.png

nmt_build.png

Analyze NMT summaries

Once you have 24 hours of summary data appended to the $nmtLog please upload them to your existing support ticket within CloudBees Support for review. If your archive is larger than 20Mb please use this [service] (https://uploads.cloudbees.com/) to send it to us. This service works best in Chrome or Firefox.

Have more questions?

0 Comments

Please sign in to leave a comment.