Zerotier multipath failover time is so slow(16 secs)

Hello,

Here’s my setup illustration:
Google Drive link

(1) Goal
I’m trying to set up channel bonding. The bonding itself works, but the failover between RJ45 LAN and Wi-Fi takes around 16 seconds, which feels too long. My goal is to get it under 2 seconds.

(2) local.conf
Located at /var/lib/zerotier-one.
I confirmed that all settings were applied correctly using sudo zerotier-cli info -j.

{
  "settings": {
    "allowManaged": true,
    "allowGlobal": false,
    "allowDefault": false,
    "allowDNS": false,
    "defaultBondingPolicy": "rapid-active-backup",
    "policies": {
      "rapid-active-backup": {
        "basePolicy": "active-backup",
        "falloverInterval": 1000,
        "links": {
          "eth0": {
            "ipvPref": 46,
            "failoverTo": "wlan0"
          },
          "wlan0": {
            "ipvPref": 46,
            "failoverTo": "eth0"
          }
        }
      }
    }
  }
}

Even though I set falloverInterval to 1000, the failover time is the same as the default.

The falloverInterval is set to 1000, but the actual failover time doesn’t change—it’s still the same as the default.

(3) What I tried

  • Verified interface names with sudo lshw -C network and used the logical names in the config.
  • Tested with both TCP and ping, but failover time remained slow.

I found out that I had miswritten failoverInterval as falloverInterval.

After correcting it, channel bonding worked flawlessly on the first try.

However, when I repeatedly unplug either the RJ45 LAN or Wi-Fi, the bonding becomes noticeably slower. I think some additional configuration may need to be adjusted.

Currently, the system behaves like a priority failover: RJ45 takes precedence, and if it goes down Wi-Fi steps in, but once Wi-Fi also goes down, there is nothing left to fall back on.

What I actually want is more of a circular failover structure, where RJ45 and Wi-Fi continuously back each other up — so if one drops, the other immediately takes over, and this keeps alternating as long as at least one of them is available.

I tried also below config, but the video crashes.
(with actual-backup policies, I can receive video well.)

[Setting]
{ “settings”: { “defaultBondingPolicy”: “broadcast” } }

[Video Result]

Result of command on my bonding test between a Windows desktop and a Linux Jetson Nano (both connected via Ethernet and Wi-Fi):

  1. Linux (Jetson Nano)

Result of sudo zerotier-cli bond list:

yjp@yjp-desktop:/var/lib/zerotier-one$ sudo zerotier-cli bond list
    <peer>                        <bondtype>     <links>
cafe80ed74                              none         0/3
e26fbb3da1                     active-backup         0/2
  1. Windows (Desktop, run as Administrator)
    Result of zerotier-cli bond list:
PS C:\Users\RGBLAB> zerotier-cli bond list
    <peer>                        <bondtype>     <links>
7c040e4e6f                     active-backup         0/0
cafefd6717                              none         0/0

Both machines are connected to the same network via Ethernet and Wi-Fi.
On Linux I can see bonded links (e.g. 0/2), but on Windows the same peer only shows 0/0.

Is it even possible that one side reports 2/0 (or 0/2) while the other side shows 0/0?, for the same bonded peer.

Found solution but need expert opinion :

It seems that channel-bonding and failover itself had been working well for a long time.

Previously, failover always took more than 16 seconds, and I finally found the reason.
Failover itself is fast (within 1 second), but only after the network goes through a certain re-registration or stabilization process. For example, when I unplug and re-plug the LAN cable or disable/enable Wi-Fi, I have to wait over 16 seconds before the system is ready. After that waiting period, failover works very quickly.

I would like to ask:

  • Why does this long “waiting period” happen before fast failover kicks in?
  • Is there any way to shorten this and make failover happen immediately?

For your reference: I wrote the local.conf file on the Windows side (C:\ProgramData\ZeroTier\One). In my previous post, I didn’t include that. Sorry, I’m still a beginner.

Explanation of Symptom(if too long, don’t have to read) :

  • If I plug in a LAN cable but do not wait the 16 seconds for the LAN to stabilize, and then disconnect Wi-Fi, the LAN connection is not yet ready and cannot be used.
  • I had mistakenly thought this waiting period was the failover itself.
  • Once the LAN has fully stabilized (after about 16 seconds), actual failover works very quickly — within 1 second.

So the “long failover” I reported before was really just the time required for the newly plugged LAN to stabilize, not the failover mechanism itself.

So How can I reduce this stabilization time? Pulling out the Lan cable or wifi is just simulating the disconnection of wireless datalink, is there more good way to simulate it?

Have you considered that the LAN stabilisation phase is the network port itself doing a DHCP lookup?
So, you connect the cord, and the computer requests an IP address, which can take some time. Once an address is allocated, the interface can bring itself up into operation.

If your DHCP server is slow to allocate addresses, then the entire process of the interface becoming ready can be particularly slow; some systems have a 5-second timeout before they try again. Also, if there is some form of broadcast packet storm control in the network, then the DHCP allocation/request packets can be dropped, causing multiple retries for an IP address.

Thank you for your response and expert knowledge.

I tried the following command (please correct me if I’m wrong):

journalctl -u NetworkManager -f

I unplugged the LAN cable or turned off Wi-Fi after seeing the message “device Activation: successful, device activated.”
However, the video stream did not work, and even ping failed for a long time.

This makes me think the issue is not related to DHCP speed.(Network was activated well.)

Instead, I suspect the problem may be that the system is slow to recognize the network as disconnected. I found a delay in detecting disconnection. But this is only my assumption, not backed by solid reasoning.

Below is the log for journalctl -u NetworkManager -f :

-- Logs begin at Tue 2024-06-18 05:29:39 KST. --
 9월 18 21:28:01 yjp-desktop NetworkManager[3443]: <info>  [1758198481.5781] dhcp4 (wlan0): state changed unknown -> bound
 9월 18 21:28:01 yjp-desktop NetworkManager[3443]: <info>  [1758198481.5819] device (wlan0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
 9월 18 21:28:01 yjp-desktop NetworkManager[3443]: <info>  [1758198481.6136] device (wlan0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
 9월 18 21:28:01 yjp-desktop NetworkManager[3443]: <info>  [1758198481.6144] device (wlan0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
 9월 18 21:28:01 yjp-desktop NetworkManager[3443]: <info>  [1758198481.6157] manager: NetworkManager state is now CONNECTED_LOCAL
 9월 18 21:28:01 yjp-desktop NetworkManager[3443]: <info>  [1758198481.6240] manager: NetworkManager state is now CONNECTED_SITE
 9월 18 21:28:01 yjp-desktop NetworkManager[3443]: <info>  [1758198481.6244] policy: set 'HY_AP' (wlan0) as default for IPv4 routing and DNS
 9월 18 21:28:01 yjp-desktop NetworkManager[3443]: <info>  [1758198481.6276] device (wlan0): Activation: successful, device activated.
 9월 18 21:28:02 yjp-desktop NetworkManager[3443]: <info>  [1758198482.0931] manager: NetworkManager state is now CONNECTED_GLOBAL
 9월 18 21:28:13 yjp-desktop NetworkManager[3443]: <info>  [1758198493.4994] device (eth0): state change: activated -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')

I am beginning to wonder if you should use WireShark to make a network dump of the traffic. It will allow you to examine the packets on the wire, to determine when DHCP requests are being sent/dropped, and when replies are being received/dropped. Aside from that, you will also be able to see what is happening in your network, which can be quite helpful.