The following information should help guide you towards setting up NFS for usage with CloudBees Jenkins Enterprise and CloudBees Jenkins Operations Center when enabling High Availability.
This guide assumes that you are using a RHEL based system or variant. If you are not, then please use the content here as a framework for how your OS should be configured using their processes and tooling.
1) Use NFSv3
NFSv4/NFSv4.1+ has not been tested and will more than likely cause unexpected problems in your environment.
Minimal installations of RHEL do not provide NFS out of the box, so make sure to install the following package as root:
yum -y install nfs-utils
This package will pull in all the needed dependencies.
2) Disable Security Access Controls
On initial setup, it’s best to remove anything that will cause complications. You can always add these restrictions back as you’re prototyping out your infrastructure.
Disable the following components:
SELinux can be switched over to “permissive” mode by editing
/etc/sysconfig/selinux and setting
You will need to restart your operating system in order for the changes to take effect but only if you wish to disable SELinux by setting it to:
You can also run
setenforce permissive if you can’t afford to restart the OS. It’s generally encouraged by Red Hat to leave SELinux in a permissive state in case you are interested in enabling it in the future. The logging data it collects in permissive mode greatly helps with creating security policies.
To disable the firewall, run the following commands on RHEL6.
chkconfig iptables off service iptables stop
To disable the firewall on a new distro such as RHEL7, run the following commands:
systemctl disable firewalld systemctl stop firewalld
3) Configure NFS to use static ports
By default, NFS generates “dynamic” ports with the rpcbind daemon. This can result in sporadic behavior when using a corporate network that filters ports. It’s best to lock these ports down so that you can better predict behavior and provide your networking team with a definite list of exclusions to prevent downtime due to network security policy.
/etc/sysconfig/nfs file and set the following variables to ports of your choosing:
The following list should be documented internally and raised to your network/firewall team to ensure that these ports are not being blocked on your networks.
- 2049 - TCP/UDP
- 111 - TCP/UDP
- MOUNTD_PORT - TCP/UDP
- STATD_PORT - TCP/UDP
- LOCKD_TCPPORT - TCP
- LOCKD_UDPPORT - UDP
4) Configure NFS to use more concurrent processes
By default, the NFS server only allows up to 8 concurrent connections. Larger systems may require more concurrent connections, so raising this value to 16 is preferable. You can make this change in the
/etc/sysconfig/nfs file by setting the RPCNFSDCOUNT variable. By default it is commented out.
5) Configure NFS to use more resources.
Guides on tuning system memory should never be treated like a magic bullet solution, however we recommend that you at least start with this template and make adjustments as needed moving forward.
Add the following lines to your
/etc/sysctl.conf file for anything running on RHEL6 or lower. If you are using a newer distro, then add this information to a new file called
sunrpc.tcp_slot_table_entries = 128 sunrpc.tcp_max_slot_table_entries = 128 net.core.rmem_default = 262144 net.core.rmem_max = 16777216 net.core.wmem_default = 262144 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 262144 16777216 net.ipv4.tcp_wmem = 4096 262144 16777216 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_sack = 0 net.ipv4.ip_local_port_range = 1024 65000 fs.inode-max = 128000 fs.file-max = 64000 fs.nfs.nlm_tcpport = LOCKD_TCPPORT #from step 3 fs.nfs.nlm_udpport = LOCKD_UDPPORT #from step 3
The critical parameters you should tune are the sunrpc values. By default, distributions such as Red Hat intentionally lower the tcp_slot_table_entries param to 2 which is not ideal for NFS servers and thus should be raised.
On RHEL6 based systems or lower, you can automatically apply these changes by running
sysctl -p after adding the information to
/etc/sysctl.conf. Newer distros such as RHEL7 will invoke the command a little differently with
Note that sunrpc values will not work if the kernel has not loaded sunrpc before the sysctl values are applied at boot time. You can check to see if the values are applying correctly by rebooting the OS and running the following command:
You can force the OS to load sunrpc by adding a modprobe command. This should only be necessary if nfs-tools is not automatically loading the module in and adequate amount of time during boot.
options sunrpc tcp_max_slot_table_entries=128 tcp_max_slot_table_entries=128
6) Configure NFS server exports
Below is an example export, the 10.0.0.0/24 network is just an example so you could restrict the IP to just the inbound clients for security purposes.
7) Start NFS
Ensure that the rpcbind and nfs daemons are enabled at boot time and are switched on:
chkconfig rpcbind on chkconfig nfs on service rpcbind start service nfs start
systemctl enable rpcbind systemctl enable nfs systemctl start rpcbind systemctl start nfs
8) Configure NFS client mount point
Mount options to start with, note that most guides would tell you to place this in the
/etc/fstab but we’d prefer that you use autofs instead.
AutoFS is recommend because the AutoFS daemon will attempt to recover a NFS mountpoint that would otherwise go down for good if fstab was used.
Minimal installations of RHEL will not include AutoFS, so install it with the following command on your client server:
yum -y install autofs
Once AutoFS is installed, you’ll need to modify a couple of files in order to start using it.
Define a mount point in
/nfs /etc/auto.nfs --timeout 60 --ghost
Define a mapping in
jenkinsHome -fstype=nfs,rw,bg,hard,intr,rsize=32768,wsize=32768,vers=3,proto=tcp,timeo=600,retrans=2,noatime,nodiratime,async 10.0.0.200:/mnt/jenkins_home
Create the new mountable folder:
mkdir -pv /nfs chmod 775 /nfs
Ensure that the new files are chmod as 664. Executable permissions will cause the automount to not work correctly.
chmod 664 /etc/auto.nfs chmod 664 /etc/auto.master.d/nfs.autofs
Finally, configure AutoFS to start at boot time and then start the daemon:
chkconfig autofs on service autofs start
systemctl enable autofs systemctl start autofs
When AutoFS kicks in, it will automatically mount your NFS share to
/nfs/jenkinsHome unless you prefer this to go somewhere else, then adjust the path accordingly in the config files.
Older (risky) FSTab Method:
The FSTab method is not advised due to it’s nature. For instance, if a NFS mount were to suddenly go down, the fstab has no way of recovering and would require manual intervention.
/etc/fstab file and add the following line to the bottom of the file:
10.0.0.200:/mnt/jenkins_home /mnt/nfs_jenkins_home nfs _netdev,rw,bg,hard,intr,rsize=32768,wsize=32768,vers=3,proto=tcp,timeo=600,retrans=2,noatime,nodiratime,async 0 0
Note that the _netdev param is essential. It prevents the OS from trying to mount the volume before the network interfaces have a chance to negotiate a connection.
After editing the fstab, always double check your entry with the
mount command before rebooting the OS. Failure to do so will force RHEL to go into recovery mode.
mount -a -v
9) Configure Jenkins to use the NFS mount point
You can configure Jenkins to use a different home directory. To do this, edit the
/etc/sysconfig/jenkins file and change the
Save the file and restart Jenkins.
service jenkins stop service jenkins start
10) General Troubleshooting:
Check if the I/O operations are causing the Jenkins master slowness
top command on Unix provides the time the CPU is waiting for I/O completion
wa, IO-wait : time waiting for I/O completion
On the example below we can see that only a 0.3% of the time the CPU was waiting for I/O completion.
top - 11:12:06 up 1:11, 1 user, load average: 0.03, 0.02, 0.05 Tasks: 74 total, 2 running, 72 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 501732 total, 491288 used, 10444 free, 4364 buffers KiB Swap: 0 total, 0 used, 0 free. 42332 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1054 jenkins 20 0 4189920 332444 6960 S 0.7 66.3 1:01.26 java 1712 vagrant 20 0 107700 1680 684 S 0.3 0.3 0:00.20 sshd 1 root 20 0 33640 2280 792 S 0.0 0.5 0:00.74 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.05 ksoftirqd/0
Determine if the disk Jenkins is using might be causing the performance issues
At this point, we can use
iostat to understand if the mount we are using is or not causing the slowness.
$ iostat -x Linux 3.13.0-77-generic (vagrant-ubuntu-trusty-64) 02/14/2017 _x86_64_ (1 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 1.24 0.00 0.20 0.03 0.00 98.54 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.05 1.80 9.00 1.24 292.35 23.47 61.72 0.00 0.34 0.35 0.26 0.23 0.23 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util hda 0.05 1.50 9.00 1.24 192.35 22.45 51.72 0.00 0.56 0.38 0.56 0.53 0.27
We need to look at:
r_await: The average time (in milliseconds) for read requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
w_await: The average time (in milliseconds) for write requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
%util: Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100% for devices serving requests serially. But for devices serving requests in parallel, such as RAID arrays and modern SSDs, this number does not reflect their performance lim???its
Debugging information for RPC and NFS
Lots of things can go wrong especially behind a complex network with rules that may be outside of your control. You can begin to make sense of what might be going wrong by enabling verbose debugging and checking the kernel system logs between both the client and the master.
Set the following 2 parameters to help generate more verbose debugging information for RPC and NFS.
sysctl -w sunrpc.rpc_debug = 2048 sysctl -w sunrpc.nfs_debug = 1
Utilizing a network monitor can also help explain odd transport issues such as packet loss or ports being blocked. RHEL based systems have a simple tool called
tcpdump that you can use to watch traffic.
This tool is very verbose so it’s a good idea to try and restrict the amount of data it reports back by filtering out what you don’t need. On the client and server, run the command like this if you wanted to monitor all the traffic on port 2049.
tcpdump -w /tmp/dump port 2049
Once you dump the data, you can use your favorite text editor to read it, or use the tcpdump tool to further filter it. The following example will read the dump file, filter the data based of an example source ip address and then write the results to a new file.
tcpdump -r /tmp/dump -w /tmp/smaller tcp src 10.0.0.200
Other tools exist such as nfstrace but you’ll need to compile it.
Services are running
Verify that both
rpcbind services are running.
If only the
nfs service is running you may face some errors like
java.nio.file.FileSystemException: /some/file: Device or resource busy like explained in this article.