NFS Guide

The following information should help guide you towards setting up NFS storage for usage with CloudBees CI. It is assumed that a storage admin team will be responsible for providing a storage solution and NFS volume for you to use, and that this team will be able to assist with configuring your clients (CloudBees CI servers) to mount the volume. Our recommendations are based on our experience configuring NFS for the best performance under the typical IO workload of Jenkins.

1) Use SSD disks if possible

Your NFS volume should use SSD disks if possible. The performance improvement of SSD over spindles/spinning disks is 10-20x.

2) Use NFS v4.1

CloudBees engineering has validated NFS v4.1 and it is the recommended NFS version for all Jenkins environments. File storage vendors have reported to CloudBees customers that there are known performance issues with v4.0. NFS v3 is known to be performant, but is considered insecure in most environments.

3) Configure NFS client mount point

An example with our recommended mount options:

10.0.0.200:/mnt/jenkins_home /mnt/nfs_jenkins_home nfs _netdev,rw,bg,hard,intr,rsize=32768,wsize=32768,vers=4.1,proto=tcp,timeo=600,retrans=2,noatime,nodiratime,async 0 0

Note that the _netdev param is essential. It prevents the OS from trying to mount the volume before the network interfaces have a chance to negotiate a connection.

We recommend the rsize=32768,wsize=32768 read and write block sizes because Jenkins performs a high volume of small reads and writes (mainly for working with build log files). These block sizes should yield the best performance for that.

The noatime,nodiratime,async settings are also important for best performance.

4) Configure Jenkins to use the NFS mount point

You can configure Jenkins to use a different home directory. To do this, edit the service config file (location is dependent on your OS and version of CloudBees CI - see this guide for details) and change the JENKINS_HOME variable:

JENKINS_HOME="/nfs/jenkinsHome"

Save the file and restart Jenkins.

service jenkins stop
service jenkins start

5) General Troubleshooting:

Check if the I/O operations are causing the Jenkins master slowness

The top command on Unix provides the time the CPU is waiting for I/O completion

wa, IO-wait : time waiting for I/O completion

On the example below we can see that only a 0.3% of the time the CPU was waiting for I/O completion.

top - 11:12:06 up  1:11,  1 user,  load average: 0.03, 0.02, 0.05
Tasks:  74 total,   2 running,  72 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:    501732 total,   491288 used,    10444 free,     4364 buffers
KiB Swap:        0 total,        0 used,        0 free.    42332 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 1054 jenkins   20   0 4189920 332444   6960 S  0.7 66.3   1:01.26 java
 1712 vagrant   20   0  107700   1680    684 S  0.3  0.3   0:00.20 sshd
    1 root      20   0   33640   2280    792 S  0.0  0.5   0:00.74 init
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0       0      0      0 S  0.0  0.0   0:00.05 ksoftirqd/0

Determine if the disk Jenkins is using might be causing the performance issues

At this point, we can use nfsiostat to understand if the mount we are using is or not causing the slowness. It’s best to run it several times, a few seconds apart, to get a sense of performance over time, because it will vary depending on how much IO activity there is. nfsiostat 2 20 will run it every two seconds for twenty runs.

# nfsiostat

10.130.12.150:/data01 mounted on /data01:

   op/s         rpc bklog
   0.08            0.00
read:             ops/s            kB/s           kB/op         retrans         avg RTT (ms)    avg exe (ms)
                  0.052           6.436         124.154        0 (0.0%)           9.365           9.617
write:            ops/s            kB/s           kB/op         retrans         avg RTT (ms)    avg exe (ms)
                  0.001           0.214         199.536        0 (0.0%)           5.673          72.526

The main things we are interested in are the avg RTT time (duration from the time that client’s kernel sends the RPC request until the time it receives the reply) and avg exe time (duration from the time that NFS client does the RPC request to its kernel until the RPC request is completed, this includes the RTT time). It’s normal for reads to be faster than writes, but you would not want to see exe times above 100 ms on a busy system.

Conclusion

For further assistance with using NFS, including performance issues, we recommend working with your storage admin team, or contacting CloudBees Support.

Have more questions?

0 Comments

Please sign in to leave a comment.