There is apparently a connectivity issue between two components
(Be it CJOC, a Client Master, an agent, or anything requiring connectivity).
Looking at the support bundle and informations provide can lead to think the issue could be coming from something in between those components.
So we need to gather dumps from each side to understand where the connection might have be reset or cut down, for example.
On each node requested by support, run the following commands in background.
For example, if you have issues between a CJOC, a Client Master and one agent, you want to run this 3 times.
Command(s) to run
$ export CASEID=CHANGE_IT_TO_THE_ZENDESK_ISSUE_ID $ mkdir zd-$CASEID-tcpdump-$(hostname) $ cd zd-$CASEID-tcpdump-$(hostname) $ nohup sudo tcpdump -i eth0 -s 1522 -C 10000000 -w tcpdump.cap -W 100 -Z root &
When do I run this command?
You can, and should, run it before the issue occurs again.
Indeed: this is going to set up rotating dump files and dump the network traffic there. So, it will take up to ~1 GB of disk on each node (100 x ~10 MB), and not more.
What do you I do when this has occurred again?
Zip the directory you created above, and send it our way.
$ zip -9 -r zd-$CASEID-tcpdump-$(hostname).zip zd-$CASEID-tcpdump-$(hostname)
You will most probably need [that documentation](How to send a file that is too large for zendesk.md) to send us big files.
- Make sure to tell us the exact time slot where you saw the issue occur again.
This is required for us to be able to correlate the dates between network dumps, support bundles data and any information you may have provided us with.
- IMPORTANT: machines must be synchronized in time for us to be able to analyze them.
- Please also provide a new support bundle per each node when the next issue occurs. That may help us correlate network packets to potential error logs.