Zero Tier is not using all the available bandwidth available on secondary network interface

I have a 3 node Proxmox cluster in a data center setup. They are connected via public gatway to internet over 1 GBPS connection. Also these hosts are equipped with 10GBPS connection and connected to a private switch. Please see attached diagram.

I am intending to use Zero tier in such a way that when communications are happening between the peers, they should use 10 GPBS connection and if there is some issue with 10 GBPS connection, then the link should fall back to 1 GBPS connection.

I have setup the Zerotier in a “Balance Aware” bonding mode and also indicated that 10GBPS connection is available to Zero Tier. But somehow Zero Tier is not consuming all the available bandwidth of 10 GBPS connection and ending up with much lesser speed than 10 GBPS while peer communication happens.

This is the /var/lib/zerotier-one/local.conf for your reference. All three hosts have same local.conf.

    "physical": {
        "10.0.0.0/24": {
            "trustedPathId": 101010024
        }
    },
 "settings":
  {
         "interfacePrefixBlacklist": [ "tap100i0","fwbr100i0","fwpr100p0@fwln100i0","fwln100i0@fwpr100p0", "eno1","eno2","eno3","eno4","enp129s0" ],
    "defaultBondingPolicy": "custom-balance-aware",
    "policies":
    {
      "custom-balance-aware":
      {
        "basePolicy": "balance-aware",
        "failoverInterval": 5000,
        "linkQuality": {
          "lat_max" : 400.0,
          "pdv_max" : 20.0,
          "lat_weight" : 0.5,
          "pdv_weight" : 0.5
        },
        "links": {
          "vmbr1": { "capacity": 10000 },
          "vmbr0": { "capacity": 1000  }
        }
      }
    }
  }
}

Here is the sample speed test I did showing the difference.

While using Zero Tier IP

root@CE-FS-Node3:/var/lib/zerotier-one# iperf -i 10 -c 10.147.17.191
------------------------------------------------------------
Client connecting to 10.147.17.191, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 10.147.17.76 port 51464 connected with 10.147.17.191 port 5001 (icwnd/mss/irtt=26/2748/884257)
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0000 sec  2.56 GBytes  2.20 Gbits/sec
[  1] 0.0000-11.0238 sec  2.56 GBytes  1.99 Gbits/sec
root@CE-FS-Node3:/var/lib/zerotier-one#

While using the private IP of the same target machine

root@CE-FS-Node3:/var/lib/zerotier-one# iperf -i 10 -c 10.0.0.6
------------------------------------------------------------
Client connecting to 10.0.0.6, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 10.0.0.9 port 49198 connected with 10.0.0.6 port 5001 (icwnd/mss/irtt=14/1448/201)
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0000 sec  10.9 GBytes  9.40 Gbits/sec
[  1] 0.0000-10.0101 sec  10.9 GBytes  9.39 Gbits/sec
root@CE-FS-Node3:/var/lib/zerotier-one#

And here is the bond details

root@CE-FS-Node3:/var/lib/zerotier-one# zerotier-cli bond list
    <peer>                        <bondtype>     <links>
e9e8af739c                     balance-aware         0/6
f27b9e5ee1                     balance-aware         18/18
root@CE-FS-Node3:/var/lib/zerotier-one# zerotier-cli bond f27b9e5ee1 show
Peer                   : f27b9e5ee1
Bond                   : balance-aware
Link Select Method     : 0
Links                  : 18/18
Failover Interval (ms) : 5000
Up Delay (ms)          : 0
Down Delay (ms)        : 0
Packets Per Link       : 64

idx                  interface                                  path               socket
----------------------------------------------------------------------------------------------------
 0:                      vmbr1                                       10.0.0.6/9993 0000559c300d2cd0
 1:                      vmbr1                                       10.0.0.6/9993 0000559c300d0210
 2:                      vmbr1                                       10.0.0.6/9993 0000559c300d0420
 3:                      vmbr0                                  123.191.123.201/9993 0000559c300d0580
 4:                      vmbr0                                  123.191.123.201/9993 0000559c300d0160
 5:                      vmbr0                                  123.191.123.201/9993 0000559c300d0370
 6:                      vmbr1                                      10.0.0.6/23987 0000559c300d2cd0
 7:                      vmbr1                                      10.0.0.6/23987 0000559c300d0210
 8:                      vmbr0                                 123.191.123.201/23987 0000559c300d0370
 9:                      vmbr0                                 123.191.123.201/23987 0000559c300d0160
10:                      vmbr1                                      10.0.0.6/23987 0000559c300d0420
11:                      vmbr0                                 123.191.123.201/23987 0000559c300d0580
12:                      vmbr1                                      10.0.0.6/26868 0000559c300d2cd0
13:                      vmbr1                                      10.0.0.6/26868 0000559c300d0210
14:                      vmbr1                                      10.0.0.6/26868 0000559c300d0420
15:                      vmbr0                                 123.191.123.201/26868 0000559c300d0160
16:                      vmbr0                                 123.191.123.201/26868 0000559c300d0370
17:                      vmbr0                                 123.191.123.201/26868 0000559c300d0580

idx     lat      pdv     plr     per    capacity    qual      rx_age      tx_age  eligible  bonded
----------------------------------------------------------------------------------------------------
 0:     0.09     0.00  0.0000  0.0000      10000  0.1011        1619        1619         1       1
 1:     0.05     0.00  0.0000  0.0000      10000  0.1011         119         119         1       1
 2:     0.05     0.00  0.0000  0.0000      10000  0.1011          15          16         1       1
 3:     0.11     0.00  0.0000  0.0000       1000  0.0101        1119        1119         1       1
 4:     0.11     0.00  0.0000  0.0000       1000  0.0101         619         619         1       1
 5:     0.14     0.00  0.0000  0.0000       1000  0.0101         119         619         1       1
 6:     0.12     0.00  0.0000  0.0000      10000  0.1011         619         619         1       1
 7:     0.09     0.00  0.0000  0.0000      10000  0.1011        1119        1119         1       1
 8:     0.16     0.00  0.0000  0.0000       1000  0.0101         619         619         1       1
 9:     0.14     0.00  0.0000  0.0000       1000  0.0101         619         619         1       1
10:     0.06     0.00  0.0000  0.0000      10000  0.1011         619         619         1       1
11:     0.08     0.00  0.0000  0.0000       1000  0.0101         619         619         1       1
12:     0.20     0.00  0.0000  0.0000      10000  0.1011         515         516         1       1
13:     0.11     0.00  0.0000  0.0000      10000  0.1011         619         619         1       1
14:     0.16     0.00  0.0000  0.0000      10000  0.1011         619         619         1       1
15:     0.11     0.00  0.0000  0.0000       1000  0.0101         619         619         1       1
16:     0.16     0.00  0.0000  0.0000       1000  0.0091         619         619         1       1
17:     0.08     0.00  0.0000  0.0000       1000  0.0101         619         619         1       1
root@CE-FS-Node3:/var/lib/zerotier-one#

I would really appreciate if I can get some clue to solve this puzzle.

Just guessing here, but that looks like a CPU bottleneck in the link encryption going through Zerotier. Even if the devices are on the same network, the communications remains encrypted. Zerotier is not multithreaded so that looks like you’re pegging one CPU core at 100% to handle the encryption/decryption and general packet handling.

I don’t think there’s any provision for having Zerotier skip encryption between nodes that are on the same LAN segment.

Hi Erik, thank you for your response!

Zero tier do have configuration called trustedPathId as per the official documentation.

I will check the CPU angle. If that is the the cause, then it’s a bummer. I guess there is no way to solve it if that is the case.

Thanks!

You can run “zerotier-cli listpeers -j” and look for the peer you’re trying to use the trustedpath with. There it should show the trustedPathId. If it shows that, then at least ZeroTier is interpreting the JSON correctly and “should” be working.

With that said, I was testing this in my lab with version 1.12.2 on both systems a while back, and though ZeroTier indicated that both peers were using the Trusted Path with the same ID, it wasn’t actually bypassing encryption. Packet captures were identical for packets both with and without the Trusted Path configured. I’m not sure if it’s just currently not working as intended, or if there might be a knob that needs to be turned that isn’t accounted for in their documentation.

EDIT: I should also add that even without encryption, you still have mechanisms that ZeroTier will use to stretch Layer 2. Assuming it uses a normal VxLAN packet, it will have to shim UDP, VxLAN, Inner MAC, and Inner IP headers, which will have a performance hit, so you shouldn’t expect the same performance as you’d get with just a normal IP based test. I would expect to see more than you’re currently seeing though even with the additional encapsulation process.

1 Like

Consider that, with other vpns, like openvpn, it is also difficult to saturate a 1gbit connection… filling a 10gb connection with encripted vpn traffic is very high-end!

Hi There,

Thanks for all the suggestions. I will check the settings as per you suggestion and revert back.

Hi There,

Just checked the TrustedPathId (whether it is being used or not), and it is indeed using TrustedPathId as configured. But still Zero Tier is not using all the available speed of the pipe. Must be some kind of bug it seems. Looks like dead end for me on Zero Tier.

“address”: “f27b9e5ee1”,
“bondingPolicyCode”: 5,
“bondingPolicyStr”: “balance-aware”,
“downDelay”: 0,
“failoverInterval”: 5000,
“isBonded”: true,
“latency”: 0,
“numAliveLinks”: 18,
“numTotalLinks”: 18,
“packetsPerLink”: 64,
“paths”: [
{
“active”: true,
“address”: “10.0.0.6/9993”,
“assignedFlowCount”: 0,
“bonded”: 1,
“eligible”: 1,
“expired”: false,
“givenLinkSpeed”: 10000,
“ifname”: “vmbr1”,
“lastInAge”: 1466,
“lastOutAge”: 1466,
“lastReceive”: 1697754260771,
“lastSend”: 1697754260771,
“latencyMean”: 0.0,
“latencyVariance”: 0.0,
“localSocket”: 94129309428944,
“packetErrorRatio”: 0.0,
“packetLossRatio”: 0.0,
“preferred”: false,
“relativeQuality”: 0.10101150721311569,
“trustedPathId”: 101010024
},

Here’s a thread from 2021 where they state Trusted Paths are deprecated and will be removed in 2.0:

Unfortunately it seems like they didn’t have a strong enough business case to maintain it.

You can certainly still use ZeroTier, you’d just need to rethink how you want to accomplish what you’re trying to do. Realistically, you don’t need to talk east-to-west in your datacenter using ZeroTier, since they’re already on a common LAN. You can just communicate between them using that 10.0.0.0/24 network. You’d simply use ZeroTier to let remote hosts get to the 10.0.0.0/24 network as a managed route.

Hi,

The intention of using Zero Tier here is to use 10 GBPS as primary. If there is some issue on Primary, I am expecting Zero tier to fall back to secondary 1 GBPS connection. The servers are in Data Center, and it’s not easy to add another 10 GBPS as secondary and use Linux bond. As that option is not easy to implement, I was thinking that Zero tier will solve this problem.

Anyway, thanks for taking time and answering. :smiley:

Gotcha! So you never really needed the VPN functionality of ZeroTier, you were just looking for something that you could use to easily control path selection.

You can install FRR on proxmox and use it to control the path selection. It won’t quite have the intelligence you’d have with a normal SDN solution like failover based on loss, latency, jitter, etc…, but you would be able to detect if a path can’t forward traffic and allow the network to reconverge to the backup path.

Hi,

Thanks for showing me the new avenue! will check on it. Will continue to use Zero Tier for VPN only. I have to say, when I looked at those customization options, I was very impressed.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.