NFS Guide

The following information should help guide you towards setting up NFS for usage with CloudBees Jenkins Enterprise and CloudBees Jenkins Operations Center when enabling High Availability.

This guide assumes that you are using a RHEL based system or variant. If you are not, then please use the content here as a framework for how your OS should be configured using their processes and tooling.

1) Use NFSv3

NFSv4/NFSv4.1+ has not been tested and will more than likely cause unexpected problems in your environment.

Minimal installations of RHEL do not provide NFS out of the box, so make sure to install the following package as root:

yum -y install nfs-utils

This package will pull in all the needed dependencies.

2) Disable Security Access Controls

On initial setup, it’s best to remove anything that will cause complications. You can always add these restrictions back as you’re prototyping out your infrastructure.

Disable the following components:

  • SELinux
  • IPTables/FirewallD

SELinux can be switched over to “permissive” mode by editing /etc/sysconfig/selinux and setting SELINUX to:

SELINUX=permissive

You will need to restart your operating system in order for the changes to take effect but only if you wish to disable SELinux by setting it to:

SELINUX=disabled

You can also run setenforce permissive if you can’t afford to restart the OS. It’s generally encouraged by Red Hat to leave SELinux in a permissive state in case you are interested in enabling it in the future. The logging data it collects in permissive mode greatly helps with creating security policies.

To disable the firewall, run the following commands on RHEL6.

chkconfig iptables off
service iptables stop

To disable the firewall on a new distro such as RHEL7, run the following commands:

systemctl disable firewalld
systemctl stop firewalld

3) Configure NFS to use static ports

By default, NFS generates “dynamic” ports with the rpcbind daemon. This can result in sporadic behavior when using a corporate network that filters ports. It’s best to lock these ports down so that you can better predict behavior and provide your networking team with a definite list of exclusions to prevent downtime due to network security policy.

Edit the /etc/sysconfig/nfs file and set the following variables to ports of your choosing:

  • MOUNTD_PORT=port
  • STATD_PORT=port
  • LOCKD_TCPPORT=port
  • LOCKD_UDPPORT=port

The following list should be documented internally and raised to your network/firewall team to ensure that these ports are not being blocked on your networks.

Port List:
- 2049 - TCP/UDP
- 111 - TCP/UDP
- MOUNTD_PORT - TCP/UDP
- STATD_PORT - TCP/UDP
- LOCKD_TCPPORT - TCP
- LOCKD_UDPPORT - UDP

4) Configure NFS to use more concurrent processes

By default, the NFS server only allows up to 8 concurrent connections. Larger systems may require more concurrent connections, so raising this value to 16 is preferable. You can make this change in the /etc/sysconfig/nfs file by setting the RPCNFSDCOUNT variable. By default it is commented out.

RPCNFSDCOUNT=16

5) Configure NFS to use more resources.

Guides on tuning system memory should never be treated like a magic bullet solution, however we recommend that you at least start with this template and make adjustments as needed moving forward.

Add the following lines to your /etc/sysctl.conf file for anything running on RHEL6 or lower. If you are using a newer distro, then add this information to a new file called /etc/sysctl.d/30-nfs.conf

sunrpc.tcp_slot_table_entries = 128
sunrpc.tcp_max_slot_table_entries = 128
net.core.rmem_default = 262144
net.core.rmem_max = 16777216
net.core.wmem_default = 262144
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 262144 16777216
net.ipv4.tcp_wmem = 4096 262144 16777216
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_sack = 0
net.ipv4.ip_local_port_range = 1024 65000
fs.inode-max = 128000
fs.file-max = 64000
fs.nfs.nlm_tcpport = LOCKD_TCPPORT #from step 3
fs.nfs.nlm_udpport = LOCKD_UDPPORT #from step 3

The critical parameters you should tune are the sunrpc values. By default, distributions such as Red Hat intentionally lower the tcp_slot_table_entries param to 2 which is not ideal for NFS servers and thus should be raised.

On RHEL6 based systems or lower, you can automatically apply these changes by running sysctl -p after adding the information to /etc/sysctl.conf. Newer distros such as RHEL7 will invoke the command a little differently with sysctl --system.

Note that sunrpc values will not work if the kernel has not loaded sunrpc before the sysctl values are applied at boot time. You can check to see if the values are applying correctly by rebooting the OS and running the following command:

sysctl sunrpc.tcp_max_slot_table_entries

You can force the OS to load sunrpc by adding a modprobe command. This should only be necessary if nfs-tools is not automatically loading the module in and adequate amount of time during boot.

/etc/modprobe.d/sunrpc.conf:

options sunrpc tcp_max_slot_table_entries=128 tcp_max_slot_table_entries=128

6) Configure NFS server exports

Below is an example export, the 10.0.0.0/24 network is just an example so you could restrict the IP to just the inbound clients for security purposes.

/mnt/jenkins_home 10.0.0.0/24(rw,async,no_root_squash)

7) Start NFS

Ensure that the rpcbind and nfs daemons are enabled at boot time and are switched on:

RHEL6:

chkconfig rpcbind on
chkconfig nfs on
service rpcbind start
service nfs start

RHEL7:

systemctl enable rpcbind
systemctl enable nfs
systemctl start rpcbind
systemctl start nfs

8) Configure NFS client mount point

Mount options to start with, note that most guides would tell you to place this in the /etc/fstab but we’d prefer that you use autofs instead.

AutoFS Method:

AutoFS is recommend because the AutoFS daemon will attempt to recover a NFS mountpoint that would otherwise go down for good if fstab was used.

Minimal installations of RHEL will not include AutoFS, so install it with the following command on your client server:

yum -y install autofs

Once AutoFS is installed, you’ll need to modify a couple of files in order to start using it.

Define a mount point in /etc/auto.master.d/nfs.autofs:

/nfs /etc/auto.nfs --timeout 60 --ghost

Define a mapping in /etc/auto.misc:

jenkinsHome -fstype=nfs,rw,bg,hard,intr,rsize=32768,wsize=32768,vers=3,proto=tcp,timeo=600,retrans=2,noatime,nodiratime,async 10.0.0.200:/mnt/jenkins_home

Create the new mountable folder:

mkdir -pv /nfs
chmod 775 /nfs

Ensure that the new files are chmod as 664. Executable permissions will cause the automount to not work correctly.

chmod 664 /etc/auto.nfs
chmod 664 /etc/auto.master.d/nfs.autofs

Finally, configure AutoFS to start at boot time and then start the daemon:

RHEL6:

chkconfig autofs on
service autofs start

RHEL7:

systemctl enable autofs
systemctl start autofs

When AutoFS kicks in, it will automatically mount your NFS share to /nfs/jenkinsHome unless you prefer this to go somewhere else, then adjust the path accordingly in the config files.

Older (risky) FSTab Method:

The FSTab method is not advised due to it’s nature. For instance, if a NFS mount were to suddenly go down, the fstab has no way of recovering and would require manual intervention.

Edit the /etc/fstab file and add the following line to the bottom of the file:

10.0.0.200:/mnt/jenkins_home /mnt/nfs_jenkins_home nfs _netdev,rw,bg,hard,intr,rsize=32768,wsize=32768,vers=3,proto=tcp,timeo=600,retrans=2,noatime,nodiratime,async 0 0

Note that the _netdev param is essential. It prevents the OS from trying to mount the volume before the network interfaces have a chance to negotiate a connection.

After editing the fstab, always double check your entry with the mount command before rebooting the OS. Failure to do so will force RHEL to go into recovery mode.

mount -a -v

9) Configure Jenkins to use the NFS mount point

You can configure Jenkins to use a different home directory. To do this, edit the /etc/sysconfig/jenkins file and change the JENKINS_HOME variable:

JENKINS_HOME="/nfs/jenkinsHome"

Save the file and restart Jenkins.

service jenkins stop
service jenkins start

10) General Troubleshooting:

Check if the I/O operations are causing the Jenkins master slowness

The top command on Unix provides the time the CPU is waiting for I/O completion

wa, IO-wait : time waiting for I/O completion

On the example below we can see that only a 0.3% of the time the CPU was waiting for I/O completion.

top - 11:12:06 up  1:11,  1 user,  load average: 0.03, 0.02, 0.05
Tasks:  74 total,   2 running,  72 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:    501732 total,   491288 used,    10444 free,     4364 buffers
KiB Swap:        0 total,        0 used,        0 free.    42332 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 1054 jenkins   20   0 4189920 332444   6960 S  0.7 66.3   1:01.26 java
 1712 vagrant   20   0  107700   1680    684 S  0.3  0.3   0:00.20 sshd
    1 root      20   0   33640   2280    792 S  0.0  0.5   0:00.74 init
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0       0      0      0 S  0.0  0.0   0:00.05 ksoftirqd/0

Determinate if the disk Jenkins is using might be causing the performance issues

At this point, we can use iostat to understand if the mount we are using is or not causing the slowness.

$ iostat -x
Linux 3.13.0-77-generic (vagrant-ubuntu-trusty-64) 	02/14/2017 	_x86_64_	(1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.24    0.00    0.20    0.03    0.00   98.54

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.05     1.80    9.00    1.24   292.35    23.47    61.72     0.00    0.34    0.35    0.26   0.23   0.23

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
hda               0.05     1.50    9.00    1.24   192.35    22.45    51.72     0.00    0.56    0.38    0.56   0.53   0.27

We need to look at:
* r_await: The average time (in milliseconds) for read requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
* w_await: The average time (in milliseconds) for write requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
* %util: Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100% for devices serving requests serially. But for devices serving requests in parallel, such as RAID arrays and modern SSDs, this number does not reflect their performance lim‐its

Debugging information for RPC and NFS

Lots of things can go wrong especially behind a complex network with rules that may be outside of your control. You can begin to make sense of what might be going wrong by enabling verbose debugging and checking the kernel system logs between both the client and the master.

Set the following 2 parameters to help generate more verbose debugging information for RPC and NFS.

sysctl -w sunrpc.rpc_debug = 2048
sysctl -w sunrpc.nfs_debug = 1

Network Monitor

Utilizing a network monitor can also help explain odd transport issues such as packet loss or ports being blocked. RHEL based systems have a simple tool called tcpdump that you can use to watch traffic.

This tool is very verbose so it’s a good idea to try and restrict the amount of data it reports back by filtering out what you don’t need. On the client and server, run the command like this if you wanted to monitor all the traffic on port 2049.

tcpdump -w /tmp/dump port 2049

Once you dump the data, you can use your favorite text editor to read it, or use the tcpdump tool to further filter it. The following example will read the dump file, filter the data based of an example source ip address and then write the results to a new file.

tcpdump -r /tmp/dump -w /tmp/smaller tcp src 10.0.0.200

Other tools exist such as nfstrace but you’ll need to compile it.
https://github.com/epam/nfstrace

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.