Memory Problem: Process killed by OOM Killer

Issue

  • Jenkins suddenly crashed and dmesg shows:
[XXXXX] Out of memory: Kill process <JENKINS_PID> (java) score <SCORE> or sacrifice child
[XXXXX] Killed process <JENKINS_PID> (java) total-vm:XXXkB, anon-rss:XXXkB, file-rss:XXXkB, shmem-rss:XXXkB

Environment

Background

A Java process is made up of:

  • Java heap space (set via -Xms and -Xmx)
  • the Metaspace
  • the Native Memory area

Each one of these areas will use RAM. The memory footprint of Jenkins (a Java application) is the sum of the maximum Java heap size, the Metaspace size and the native memory area. By default the Metaspace and Native memory areas can grow to unlimited size, and they do not normally require tuning. Typical usage of each is only a few hundred MB.

It is important to understand that the Operating System itself and any other processes running on the machine have their own requirements regarding RAM and CPU. The Operating System uses a certain amount of RAM which leaves the remaining RAM to be split among Jenkins and any other processes on the machine.

Resolution

(This does not indicate a problem with Jenkins. It indicates that the Operating System is unable to provide enough resources for all the programs it has been asked to run.)

The Out Of Memory (OOM) Killer is a function of the Linux kernel that kills user processes when free RAM is very low, in order to prevent the whole system from going down due to the lack of memory. The function applies some heuristics (it gives each process a score) to decide which process to kill when the system is in such state. The process monopolizing the most memory and not releasing enough of it is more likely to be killed. In a system where Jenkins is the primary user, it tends to be the process using the most RAM, and therefore is most likely to be killed when system memory runs low.

If you are affected by this error, there could be different causes:

  1. Too much memory is allocated to Jenkins, and therefore there is not enough free for other housekeeping processes
  2. Other processes are running on the same machine as Jenkins and using too much memory

Following are recommendations for each case.

1) Too much memory allocated to Jenkins

You must not allocate all or nearly all of the system memory to the JVM where Jenkins is running. That is because the Operating System needs free memory for other housekeeping processes in addition to Jenkins.

We recommend keeping a minimum of 2-4GB of memory free for non-Jenkins processes. For example, if you are running on a system with 16GB of RAM, you should not allocate more than 12GB of heap for Jenkins. This is especially important in containerized environments, where it can be tempting to allocate small amounts of RAM to each container. When there is less than a 2GB difference between the maximum heap that Jenkins has been configured with and the total RAM available to the container, it is likely that Jenkins will eventually be killed by the kernel.

2) Other processes are impacting Jenkins

In this scenario, Jenkins is not the only process running on the machine but it is killed because it is the process consuming the most memory on the OS.

We strongly recommend that Jenkins be the primary service/process running on the machine hosting it. Should you run other processes, like for example monitoring agents, ensure that they are not overloading the system or otherwise that enough resources are available to handle the load on the machine.

How to find the culprit

It is possible to check the processes consuming the most memory at any time on the machine with commands like:

$ top -o mem

or:

$ ps aux --sort -pmem

You can also view the kernel logs by running the command dmesg. In these logs, locate the “Out of memory: Kill process <JENKINS_PID>” message. Just above that message, the kernel dumps the stats of the processes that were running. For example:

[...]
[XXXXX] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[XXXXX] [  480]     0   480    13863      113      26        0         -1000 auditd
[XXXXX] [12345]   123 12345  4704977  3306330    6732        0             0 java
[XXXXX] [11939]     0 11939    46699      328      48        0             0 crond
[XXXXX] [11942]     0 11942    28282       45      12        0             0 sh
[XXXXX] [16789]   456 16789  1695936    38643     165        0             0 java
[...]
[XXXXX] Out of memory: Kill process 12345 (java) score 869 or sacrifice child
[XXXXX] Killed process 12345 (java) total-vm:18819908kB, anon-rss:13225320kB, file-rss:0kB, shmem-rss:0kB
[...]

In this example, the Jenkins PID was 12345 and it was killed. We can see in the summary (Killed process line) that Jenkins was using ~13 GiB of memory (see anon-rss - the total-vm value can be disregarded). However, in the table there is also another process with PID 16789 that is reserving ~6.4 GiB of memory (note that table memory values are in 4 KiB pages, so you must multiply the rss value by 4 to determine actual RAM usage in KiB). You can then investigate more about this other process and see what it does by running the following command:

$ ps -f <pid>

It is possible that this process is leaking memory or perhaps just should not be running on the same system as Jenkins.

Resources

For more details about the OOM Killer and this particular issue, have a look at the following links:

Have more questions?

0 Comments

Please sign in to leave a comment.