Hello,
At the moment I am trying to run Zerotier on Edgerouter and Vyos failsafe. For this I have made a test with an output router (Router 1) which has two asyncronous WAN connections (500/50). The target is a Vyos router in a data center (router 2), which has only one WAN connection (which is redundant) and another router (router 3), which also has two asyncronous WAN connections. The config looks like this:
Router 1: 192.168.100.1
{
“settings”: {
“defaultBondingPolicy”: “myPolicy”,
“policies”: {
“myPolicy”: {
“basePolicy”: “balance-aware”,
“failoverInterval”: 250,
“balancePolicy”: “flow-static”,
“allowFlowHashing”: true,
“rebalanceStrategy”: “aggressive”,
“links”: {
“eth7”: {
“ipvPref”: 4,
“enabled”: true
},
“eth9”: {
“ipvPref”: 4,
“enabled”: true
}
}
}
},
“peerSpecificBonds”: { “2b400xxxxx”:“myPolicy” },
“peerSpecificBonds”: { “92818xxxxx”:“myPolicy” },
“peerSpecificBonds”: { “ac88axxxxx”:“myPolicy” }
}
}
Router 2: 192.168.208.1
{
“settings”: {
“defaultBondingPolicy”: “myPolicy”,
“policies”:{
“myPolicy”: {
“basePolicy”: “balance-aware”
}
}
}
}
Router 3: 192.168.152.1
{
“settings”: {
“defaultBondingPolicy”: “myPolicy”,
“policies”: {
“myPolicy”: {
“basePolicy”: “balance-aware”,
“failoverInterval”: 250,
“balancePolicy”: “flow-static”,
“allowFlowHashing”: true,
“rebalanceStrategy”: “aggressive”,
“links”: {
“pppoe0”: {
“ipvPref”: 4,
“enabled”: true
},
“pppoe1”: {
“ipvPref”: 4,
“enabled”: true
}
}
}
},
“peerSpecificBonds”: { “2b400xxxxx”:“myPolicy” },
“peerSpecificBonds”: { “92818xxxxx”:“myPolicy” },
“peerSpecificBonds”: { “50ba9xxxxx”:“myPolicy” }
}
}
A ping to 192.168.2081 and 152.1 look relatively unobtrusive.
The pregnancies are coming from the line. They are cable lines.
PING 192.168.208.1 (192.168.208.1) 56(84) bytes of data.
64 bytes from 192.168.208.1: icmp_seq=1 ttl=64 time=11.3 ms
64 bytes from 192.168.208.1: icmp_seq=2 ttl=64 time=10.3 ms
64 bytes from 192.168.208.1: icmp_seq=3 ttl=64 time=9.44 ms
64 bytes from 192.168.208.1: icmp_seq=4 ttl=64 time=8.20 ms
64 bytes from 192.168.208.1: icmp_seq=5 ttl=64 time=15.9 ms
64 bytes from 192.168.208.1: icmp_seq=6 ttl=64 time=8.76 ms
…
— 192.168.208.1 ping statistics —
21 packets transmitted, 21 received, 0% packet loss, time 20027ms
rtt min/avg/max/mdev = 8.071/20.983/199.625/40.231 ms
12 packets transmitted, 12 received, 0% packet loss, time 11015ms
rtt min/avg/max/mdev = 14.003/22.258/50.162/11.000 ms
root@ml-sc-100:/var/lib/zerotier-one# ping 192.168.152.1
PING 192.168.152.1 (192.168.152.1) 56(84) bytes of data.
64 bytes from 192.168.152.1: icmp_seq=1 ttl=64 time=19.7 ms
64 bytes from 192.168.152.1: icmp_seq=2 ttl=64 time=22.3 ms
64 bytes from 192.168.152.1: icmp_seq=3 ttl=64 time=22.3 ms
64 bytes from 192.168.152.1: icmp_seq=4 ttl=64 time=13.9 ms
64 bytes from 192.168.152.1: icmp_seq=5 ttl=64 time=17.9 ms
…
— 192.168.152.1 ping statistics —
23 packets transmitted, 23 received, 0% packet loss, time 22002ms
rtt min/avg/max/mdev = 13.800/20.361/47.645/7.860 ms
The CPU load is about 5% over all.
Zerotier-One alone about 10% (from one core).
Now I give an upmod to ETH7 from router one. And I load the upload line completely. The theory would now say that Zerotier routes the tunnel completely over ETH9. At least I understand the entry “balancePolicy”: “flow-static” that way. The quality of ETH7 should now be so bad that it only routes via ETH9.
PING 192.168.208.1 (192.168.208.1) 56(84) bytes of data.
64 bytes from 192.168.208.1: icmp_seq=1 ttl=64 time=15.3 ms
64 bytes from 192.168.208.1: icmp_seq=2 ttl=64 time=24.7 ms
64 bytes from 192.168.208.1: icmp_seq=3 ttl=64 time=20.1 ms
64 bytes from 192.168.208.1: icmp_seq=4 ttl=64 time=9.29 ms
64 bytes from 192.168.208.1: icmp_seq=5 ttl=64 time=9.53 ms
64 bytes from 192.168.208.1: icmp_seq=6 ttl=64 time=13.9 ms
64 bytes from 192.168.208.1: icmp_seq=7 ttl=64 time=23.5 ms
64 bytes from 192.168.208.1: icmp_seq=9 ttl=64 time=36.3 ms
64 bytes from 192.168.208.1: icmp_seq=10 ttl=64 time=24.2 ms
64 bytes from 192.168.208.1: icmp_seq=12 ttl=64 time=9.78 ms
64 bytes from 192.168.208.1: icmp_seq=14 ttl=64 time=12.0 ms
64 bytes from 192.168.208.1: icmp_seq=15 ttl=64 time=8.96 ms
64 bytes from 192.168.208.1: icmp_seq=16 ttl=64 time=9.13 ms
64 bytes from 192.168.208.1: icmp_seq=18 ttl=64 time=11.0 ms
64 bytes from 192.168.208.1: icmp_seq=19 ttl=64 time=11.2 ms
64 bytes from 192.168.208.1: icmp_seq=21 ttl=64 time=17.8 ms
64 bytes from 192.168.208.1: icmp_seq=22 ttl=64 time=9.58 ms
64 bytes from 192.168.208.1: icmp_seq=23 ttl=64 time=12.2 ms
64 bytes from 192.168.208.1: icmp_seq=27 ttl=64 time=13.9 ms
64 bytes from 192.168.208.1: icmp_seq=28 ttl=64 time=14.9 ms
64 bytes from 192.168.208.1: icmp_seq=31 ttl=64 time=30.7 ms
64 bytes from 192.168.208.1: icmp_seq=32 ttl=64 time=10.1 ms
^C
— 192.168.208.1 ping statistics —
32 packets transmitted, 22 received, 31% packet loss, time 31027ms
rtt min/avg/max/mdev = 8.969/15.859/36.373/7.507 ms
32 packets transmitted, 22 received, 31% packet loss, time 31027ms
rtt min/avg/max/mdev = 8.969/15.859/36.373/7.507 ms
root@ml-sc-100:/var/lib/zerotier-one# ping 192.168.152.1
PING 192.168.152.1 (192.168.152.1) 56(84) bytes of data.
64 bytes from 192.168.152.1: icmp_seq=1 ttl=64 time=19.7 ms
64 bytes from 192.168.152.1: icmp_seq=2 ttl=64 time=14.3 ms
64 bytes from 192.168.152.1: icmp_seq=4 ttl=64 time=15.5 ms
64 bytes from 192.168.152.1: icmp_seq=8 ttl=64 time=15.1 ms
64 bytes from 192.168.152.1: icmp_seq=9 ttl=64 time=22.8 ms
64 bytes from 192.168.152.1: icmp_seq=12 ttl=64 time=13.6 ms
64 bytes from 192.168.152.1: icmp_seq=14 ttl=64 time=15.5 ms
64 bytes from 192.168.152.1: icmp_seq=15 ttl=64 time=15.0 ms
64 bytes from 192.168.152.1: icmp_seq=18 ttl=64 time=20.4 ms
64 bytes from 192.168.152.1: icmp_seq=20 ttl=64 time=13.4 ms
64 bytes from 192.168.152.1: icmp_seq=22 ttl=64 time=19.7 ms
64 bytes from 192.168.152.1: icmp_seq=24 ttl=64 time=43.5 ms
64 bytes from 192.168.152.1: icmp_seq=25 ttl=64 time=18.5 ms
64 bytes from 192.168.152.1: icmp_seq=27 ttl=64 time=24.9 ms
64 bytes from 192.168.152.1: icmp_seq=28 ttl=64 time=23.8 ms
64 bytes from 192.168.152.1: icmp_seq=29 ttl=64 time=14.3 ms
64 bytes from 192.168.152.1: icmp_seq=32 ttl=64 time=16.4 ms
64 bytes from 192.168.152.1: icmp_seq=33 ttl=64 time=15.2 ms
^C
— 192.168.152.1 ping statistics —
33 packets transmitted, 18 received, 45% packet loss, time 31991ms
rtt min/avg/max/mdev = 13.451/19.018/43.527/6.897 ms
As you can see, large packet losses occur on both lines. What is the reason for this? It actually contradicts the theory of balance-aware, doesn’t it?
If you now also put some load on the tunnel, like 2-6 MB, the packet runtime also goes up. In the range of 40-60ms (although there is no other traffic on eth9.) In addition, every 5-10 packets goes to a three - four digit runtime. The packet dropouts remain the same.
So something is going wrong
regards
Christian
Something else that is striking. The packet loss does not normalize when there is no load on eth7!