Long term idle connections get dropped

I have a Zerotier setup comprised of 1 network controller and 1 Zerotier in router mode, the Zerotier in router mode has a long-lived connection to the network controller, several ephemeral CI machines access the services behind the router in gaps of 30 or more minutes, this has led to experiencing issues where the connection to the service drops during CI builds.

I have been looking for the root cause of this problem for a few weeks now, however, I can’t seem to pin it down to a specific cause so far, would there be any way to gather more information to help with locating the source of those issues and therefore troubleshoot it?

Hi Drakonis,

To troubleshoot the intermittent connection drops in your ZeroTier setup, especially with ephemeral CI machines accessing services behind a router, you can gather more information using the following steps:

1. Monitor Network Connectivity

  • Ping Test: Set up continuous pings between the CI machines and the services behind the router to detect when connectivity drops. This helps pinpoint exactly when the connection issues occur.
    ping <router-ip> -i 1
    
  • Traceroute: Run traceroute during the CI build and when the connection drops to check for any issues along the path:
    traceroute <router-ip>
    

2. ZeroTier Logs and Diagnostics

  • Check ZeroTier Logs: On both the router and controller, review ZeroTier logs for any disconnection events or error messages:
    sudo journalctl -u zerotier-one
    
  • Enable Verbose Logging: Increase logging verbosity to capture more details in the logs:
    sudo zerotier-cli set <network-id> logLevel 2
    
  • Network Status: Regularly check the ZeroTier connection status on both the router and CI machines:
    zerotier-cli status
    

3. Check Keepalive and Timeout Settings

  • TCP Keepalive: Ensure that the router and CI machines maintain long-lived connections by adjusting TCP keepalive settings, especially if connections are idle for long periods:
    sysctl net.ipv4.tcp_keepalive_time=120
    
  • You can adjust these parameters in the /etc/sysctl.conf file to make the changes permanent.

4. Router Mode Configuration

  • Confirm NAT and Forwarding: Ensure that your router is correctly forwarding traffic to services behind it, and that NAT rules are not causing packet drops.
    sudo iptables -L -v -n
    
  • MTU Issues: Check if the MTU (Maximum Transmission Unit) settings are consistent across the network, as mismatched MTU can lead to packet fragmentation and drops.

5. CI Environment Troubleshooting

  • Check for Network Sleep/Timeout: Verify that the CI machines don’t go into network idle states during the 30-minute gaps, which may trigger connection resets.

6. ZeroTier Controller Monitoring

  • Latency and Jitter: Use tools like mtr (combines ping and traceroute) to monitor network latency and packet loss over time between the controller and the CI machines:
    mtr <controller-ip>
    
  • Bandwidth and Load: Check if the ZeroTier controller or router is overloaded during CI builds, as this could lead to connection drops.

These steps should help you gather more data and identify the cause of the connection issues during CI builds.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.