I have a 3 node Proxmox cluster in a data center setup. They are connected via public gatway to internet over 1 GBPS connection. Also these hosts are equipped with 10GBPS connection and connected to a private switch. Please see attached diagram.
I am intending to use Zero tier in such a way that when communications are happening between the peers, they should use 10 GPBS connection and if there is some issue with 10 GBPS connection, then the link should fall back to 1 GBPS connection.
I have setup the Zerotier in a “Balance Aware” bonding mode and also indicated that 10GBPS connection is available to Zero Tier. But somehow Zero Tier is not consuming all the available bandwidth of 10 GBPS connection and ending up with much lesser speed than 10 GBPS while peer communication happens.
This is the /var/lib/zerotier-one/local.conf for your reference. All three hosts have same local.conf.
"physical": {
"10.0.0.0/24": {
"trustedPathId": 101010024
}
},
"settings":
{
"interfacePrefixBlacklist": [ "tap100i0","fwbr100i0","fwpr100p0@fwln100i0","fwln100i0@fwpr100p0", "eno1","eno2","eno3","eno4","enp129s0" ],
"defaultBondingPolicy": "custom-balance-aware",
"policies":
{
"custom-balance-aware":
{
"basePolicy": "balance-aware",
"failoverInterval": 5000,
"linkQuality": {
"lat_max" : 400.0,
"pdv_max" : 20.0,
"lat_weight" : 0.5,
"pdv_weight" : 0.5
},
"links": {
"vmbr1": { "capacity": 10000 },
"vmbr0": { "capacity": 1000 }
}
}
}
}
}
Here is the sample speed test I did showing the difference.
While using Zero Tier IP
root@CE-FS-Node3:/var/lib/zerotier-one# iperf -i 10 -c 10.147.17.191
------------------------------------------------------------
Client connecting to 10.147.17.191, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 1] local 10.147.17.76 port 51464 connected with 10.147.17.191 port 5001 (icwnd/mss/irtt=26/2748/884257)
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-10.0000 sec 2.56 GBytes 2.20 Gbits/sec
[ 1] 0.0000-11.0238 sec 2.56 GBytes 1.99 Gbits/sec
root@CE-FS-Node3:/var/lib/zerotier-one#
While using the private IP of the same target machine
root@CE-FS-Node3:/var/lib/zerotier-one# iperf -i 10 -c 10.0.0.6
------------------------------------------------------------
Client connecting to 10.0.0.6, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 1] local 10.0.0.9 port 49198 connected with 10.0.0.6 port 5001 (icwnd/mss/irtt=14/1448/201)
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-10.0000 sec 10.9 GBytes 9.40 Gbits/sec
[ 1] 0.0000-10.0101 sec 10.9 GBytes 9.39 Gbits/sec
root@CE-FS-Node3:/var/lib/zerotier-one#
And here is the bond details
root@CE-FS-Node3:/var/lib/zerotier-one# zerotier-cli bond list
<peer> <bondtype> <links>
e9e8af739c balance-aware 0/6
f27b9e5ee1 balance-aware 18/18
root@CE-FS-Node3:/var/lib/zerotier-one# zerotier-cli bond f27b9e5ee1 show
Peer : f27b9e5ee1
Bond : balance-aware
Link Select Method : 0
Links : 18/18
Failover Interval (ms) : 5000
Up Delay (ms) : 0
Down Delay (ms) : 0
Packets Per Link : 64
idx interface path socket
----------------------------------------------------------------------------------------------------
0: vmbr1 10.0.0.6/9993 0000559c300d2cd0
1: vmbr1 10.0.0.6/9993 0000559c300d0210
2: vmbr1 10.0.0.6/9993 0000559c300d0420
3: vmbr0 123.191.123.201/9993 0000559c300d0580
4: vmbr0 123.191.123.201/9993 0000559c300d0160
5: vmbr0 123.191.123.201/9993 0000559c300d0370
6: vmbr1 10.0.0.6/23987 0000559c300d2cd0
7: vmbr1 10.0.0.6/23987 0000559c300d0210
8: vmbr0 123.191.123.201/23987 0000559c300d0370
9: vmbr0 123.191.123.201/23987 0000559c300d0160
10: vmbr1 10.0.0.6/23987 0000559c300d0420
11: vmbr0 123.191.123.201/23987 0000559c300d0580
12: vmbr1 10.0.0.6/26868 0000559c300d2cd0
13: vmbr1 10.0.0.6/26868 0000559c300d0210
14: vmbr1 10.0.0.6/26868 0000559c300d0420
15: vmbr0 123.191.123.201/26868 0000559c300d0160
16: vmbr0 123.191.123.201/26868 0000559c300d0370
17: vmbr0 123.191.123.201/26868 0000559c300d0580
idx lat pdv plr per capacity qual rx_age tx_age eligible bonded
----------------------------------------------------------------------------------------------------
0: 0.09 0.00 0.0000 0.0000 10000 0.1011 1619 1619 1 1
1: 0.05 0.00 0.0000 0.0000 10000 0.1011 119 119 1 1
2: 0.05 0.00 0.0000 0.0000 10000 0.1011 15 16 1 1
3: 0.11 0.00 0.0000 0.0000 1000 0.0101 1119 1119 1 1
4: 0.11 0.00 0.0000 0.0000 1000 0.0101 619 619 1 1
5: 0.14 0.00 0.0000 0.0000 1000 0.0101 119 619 1 1
6: 0.12 0.00 0.0000 0.0000 10000 0.1011 619 619 1 1
7: 0.09 0.00 0.0000 0.0000 10000 0.1011 1119 1119 1 1
8: 0.16 0.00 0.0000 0.0000 1000 0.0101 619 619 1 1
9: 0.14 0.00 0.0000 0.0000 1000 0.0101 619 619 1 1
10: 0.06 0.00 0.0000 0.0000 10000 0.1011 619 619 1 1
11: 0.08 0.00 0.0000 0.0000 1000 0.0101 619 619 1 1
12: 0.20 0.00 0.0000 0.0000 10000 0.1011 515 516 1 1
13: 0.11 0.00 0.0000 0.0000 10000 0.1011 619 619 1 1
14: 0.16 0.00 0.0000 0.0000 10000 0.1011 619 619 1 1
15: 0.11 0.00 0.0000 0.0000 1000 0.0101 619 619 1 1
16: 0.16 0.00 0.0000 0.0000 1000 0.0091 619 619 1 1
17: 0.08 0.00 0.0000 0.0000 1000 0.0101 619 619 1 1
root@CE-FS-Node3:/var/lib/zerotier-one#
I would really appreciate if I can get some clue to solve this puzzle.