I have run the script listDProcessesNativeStacks.sh and shared the data with CloudBees Support.
What is this and how does it help us understand what is causing the hanging of my master?
- Linux environment
- CloudBees CI (CloudBees Core)
- CloudBees CI (CloudBees Core) on modern cloud platforms - Managed Master
- CloudBees CI (CloudBees Core) on modern cloud platforms - Operations Center
- CloudBees CI (CloudBees Core) on traditional platforms - Client Master
- CloudBees CI (CloudBees Core) on traditional platforms - Operations Center
- CloudBees Jenkins Enterprise
- CloudBees Jenkins Enterprise - Managed Master
- CloudBees Jenkins Enterprise - Operations Center
- CloudBees Jenkins Team
- CloudBees Jenkins Platform - Client Master
- CloudBees Jenkins Platform - Operations Center
- Jenkins LTS
The script listDProcessesNativeStacks.sh uses a combination of
cat to identify processes in a D state
and dump their native stack.
It will usually be used in combination with the output of collectPerformanceData.sh to help us identify what
exactly the native thread of a process is doing.
Where to retrieve the script
The script can be downloaded from this link.
What is a D state process
It is a process that is in an uninterruptible sleep. Usually this means that the process is waiting on I/O.
What does this script bring to collectPerformanceData
With the collectPerformanceData.sh script, we only have a java view of the stack. It means that we are missing
what is happening at OS level. For instance, in the following stack we can only infer that the JVM is trying to write
something, but we have no idea what is happening at a lower level:
"Executor #-1 for master : executing myJob #11" Id=1305944 Group=main RUNNABLE (in native) at sun.nio.ch.FileDispatcherImpl.pwrite0(Native Method) at sun.nio.ch.FileDispatcherImpl.pwrite(FileDispatcherImpl.java:66) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:89) at sun.nio.ch.IOUtil.write(IOUtil.java:51)
All we can say is that the JVM is waiting on a native I/O operation. Now, running the listDProcessesNativeStacks.sh in
this context, we can extract more information:
jenkins+ 10656 4238440 2146620 ? D 00:00:00 [<ffffffff81168d1e>] sleep_on_page+0xe/0x20 [<ffffffff81168aa6>] wait_on_page_bit+0x86/0xb0 [<ffffffff81168be1>] filemap_fdatawait_range+0x111/0x1b0 [<ffffffff8116abff>] filemap_write_and_wait_range+0x3f/0x70 [<ffffffffa0422c7e>] nfs_file_fsync+0x7e/0x100 [nfs] [<ffffffff8120ff8b>] vfs_fsync+0x2b/0x40 [<ffffffffa0422f0a>] nfs_file_flush+0x7a/0xb0 [nfs] [<ffffffff811dc9f4>] filp_close+0x34/0x80 [<ffffffff811fd348>] __close_fd+0x78/0xa0 [<ffffffff811de103>] SyS_close+0x23/0x50 [<ffffffff81646d52>] tracesys+0xdd/0xe2 [<ffffffffffffffff>] 0xffffffffffffffff
Now, we can start investigating the NFS.
But how exactly can you use this?
The script is simple to use. It is designed to work on any linux system with
cat (even with the busybox
You’ll need to run it with sudo, or with the root user. You don’t have any parameter to pass to it.
You can set up the output directory with the
D_PROCESSES_OUTPUT_DIR environment variable.
In case you run with sudo, make sure to pass the environment variable to the script by using the -E switch, e.g.:
export D_PROCESSES_OUTPUT_DIR=/tmp sudo -E ./listDProcessesNativeStacks.sh
NOTE: The script will most likely not work from within a container. But this shouldn’t be an issue as the D processes should be visible from the host using the root user.
Make sure to attach the output of the script to the Support Ticket.