What is listDProcessesNativeStacks.sh and how does it help?

Issue

I have run the script listDProcessesNativeStacks.sh and shared the data with CloudBees Support.
What is this and how does it help us understand what is causing the hanging of my master?

Environment

Resolution

The script listDProcessesNativeStacks.sh uses a combination of ps, awk and cat to identify processes in a D state
and dump their native stack.

It will usually be used in combination with the output of jenkinshangWithJstack.sh to help us identify what
exactly the native thread of a process is doing.

What is a D state process

It is a process that is in an uninterruptible sleep. Usually this means that the process is waiting on I/O.

What does this script bring to jenkinshangWithJstack

With the jenkinshangWithJstack.sh script, we only have a java view of the stack. It means that we are missing
what is happening at OS level. For instance, in the following stack we can only infer that the JVM is trying to write
something, but we have no idea what is happening at a lower level:

"Executor #-1 for master : executing myJob #11" Id=1305944 Group=main RUNNABLE (in native)
    at sun.nio.ch.FileDispatcherImpl.pwrite0(Native Method)
    at sun.nio.ch.FileDispatcherImpl.pwrite(FileDispatcherImpl.java:66)
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:89)
    at sun.nio.ch.IOUtil.write(IOUtil.java:51)

All we can say is that the JVM is waiting on a native I/O operation. Now, running the listDProcessesNativeStacks.sh in
this context, we can extract more information:

jenkins+ 10656 4238440 2146620 ?     D    00:00:00
[<ffffffff81168d1e>] sleep_on_page+0xe/0x20
[<ffffffff81168aa6>] wait_on_page_bit+0x86/0xb0
[<ffffffff81168be1>] filemap_fdatawait_range+0x111/0x1b0
[<ffffffff8116abff>] filemap_write_and_wait_range+0x3f/0x70
[<ffffffffa0422c7e>] nfs_file_fsync+0x7e/0x100 [nfs]
[<ffffffff8120ff8b>] vfs_fsync+0x2b/0x40
[<ffffffffa0422f0a>] nfs_file_flush+0x7a/0xb0 [nfs]
[<ffffffff811dc9f4>] filp_close+0x34/0x80
[<ffffffff811fd348>] __close_fd+0x78/0xa0
[<ffffffff811de103>] SyS_close+0x23/0x50
[<ffffffff81646d52>] tracesys+0xdd/0xe2
[<ffffffffffffffff>] 0xffffffffffffffff

Now, we can start investigating the NFS.

But how exactly can you use this?

The script is simple to use. It is designed to work on any linux system with ps, awk and cat (even with the busybox
version of ps).
You’ll need to run it with sudo, or with the root user. You don’t have any parameter to pass to it.
You can set up the output directory with the D_PROCESSES_OUTPUT_DIR environment variable.
In case you run with sudo, make sure to pass the environment variable to the script by using the -E switch, e.g.:

export D_PROCESSES_OUTPUT_DIR=/tmp
sudo -E ./listDProcessesNativeStacks.sh

Make sure to attach the output of the script to the Support Ticket.

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.