Multipath balance-aware or balance-xor Questions

it-service · December 10, 2020, 9:12pm

Hello,

At the moment I am trying to run Zerotier on Edgerouter and Vyos failsafe. For this I have made a test with an output router (Router 1) which has two asyncronous WAN connections (500/50). The target is a Vyos router in a data center (router 2), which has only one WAN connection (which is redundant) and another router (router 3), which also has two asyncronous WAN connections. The config looks like this:

Router 1: 192.168.100.1

{
“settings”: {
“defaultBondingPolicy”: “myPolicy”,
“policies”: {
“myPolicy”: {
“basePolicy”: “balance-aware”,
“failoverInterval”: 250,
“balancePolicy”: “flow-static”,
“allowFlowHashing”: true,
“rebalanceStrategy”: “aggressive”,
“links”: {
“eth7”: {
“ipvPref”: 4,
“enabled”: true
},
“eth9”: {
“ipvPref”: 4,
“enabled”: true
}
}
}
},
“peerSpecificBonds”: { “2b400xxxxx”:“myPolicy” },
“peerSpecificBonds”: { “92818xxxxx”:“myPolicy” },
“peerSpecificBonds”: { “ac88axxxxx”:“myPolicy” }
}
}

Router 2: 192.168.208.1

{
“settings”: {
“defaultBondingPolicy”: “myPolicy”,
“policies”:{
“myPolicy”: {
“basePolicy”: “balance-aware”
}
}
}
}

Router 3: 192.168.152.1

{
“settings”: {
“defaultBondingPolicy”: “myPolicy”,
“policies”: {
“myPolicy”: {
“basePolicy”: “balance-aware”,
“failoverInterval”: 250,
“balancePolicy”: “flow-static”,
“allowFlowHashing”: true,
“rebalanceStrategy”: “aggressive”,
“links”: {
“pppoe0”: {
“ipvPref”: 4,
“enabled”: true
},
“pppoe1”: {
“ipvPref”: 4,
“enabled”: true
}
}
}
},
“peerSpecificBonds”: { “2b400xxxxx”:“myPolicy” },
“peerSpecificBonds”: { “92818xxxxx”:“myPolicy” },
“peerSpecificBonds”: { “50ba9xxxxx”:“myPolicy” }
}
}

A ping to 192.168.2081 and 152.1 look relatively unobtrusive.
The pregnancies are coming from the line. They are cable lines.

PING 192.168.208.1 (192.168.208.1) 56(84) bytes of data.
64 bytes from 192.168.208.1: icmp_seq=1 ttl=64 time=11.3 ms
64 bytes from 192.168.208.1: icmp_seq=2 ttl=64 time=10.3 ms
64 bytes from 192.168.208.1: icmp_seq=3 ttl=64 time=9.44 ms
64 bytes from 192.168.208.1: icmp_seq=4 ttl=64 time=8.20 ms
64 bytes from 192.168.208.1: icmp_seq=5 ttl=64 time=15.9 ms
64 bytes from 192.168.208.1: icmp_seq=6 ttl=64 time=8.76 ms
…
— 192.168.208.1 ping statistics —
21 packets transmitted, 21 received, 0% packet loss, time 20027ms
rtt min/avg/max/mdev = 8.071/20.983/199.625/40.231 ms

12 packets transmitted, 12 received, 0% packet loss, time 11015ms
rtt min/avg/max/mdev = 14.003/22.258/50.162/11.000 ms
root@ml-sc-100:/var/lib/zerotier-one# ping 192.168.152.1
PING 192.168.152.1 (192.168.152.1) 56(84) bytes of data.
64 bytes from 192.168.152.1: icmp_seq=1 ttl=64 time=19.7 ms
64 bytes from 192.168.152.1: icmp_seq=2 ttl=64 time=22.3 ms
64 bytes from 192.168.152.1: icmp_seq=3 ttl=64 time=22.3 ms
64 bytes from 192.168.152.1: icmp_seq=4 ttl=64 time=13.9 ms
64 bytes from 192.168.152.1: icmp_seq=5 ttl=64 time=17.9 ms
…
— 192.168.152.1 ping statistics —
23 packets transmitted, 23 received, 0% packet loss, time 22002ms
rtt min/avg/max/mdev = 13.800/20.361/47.645/7.860 ms

The CPU load is about 5% over all.
Zerotier-One alone about 10% (from one core).

Now I give an upmod to ETH7 from router one. And I load the upload line completely. The theory would now say that Zerotier routes the tunnel completely over ETH9. At least I understand the entry “balancePolicy”: “flow-static” that way. The quality of ETH7 should now be so bad that it only routes via ETH9.

PING 192.168.208.1 (192.168.208.1) 56(84) bytes of data.
64 bytes from 192.168.208.1: icmp_seq=1 ttl=64 time=15.3 ms
64 bytes from 192.168.208.1: icmp_seq=2 ttl=64 time=24.7 ms
64 bytes from 192.168.208.1: icmp_seq=3 ttl=64 time=20.1 ms
64 bytes from 192.168.208.1: icmp_seq=4 ttl=64 time=9.29 ms
64 bytes from 192.168.208.1: icmp_seq=5 ttl=64 time=9.53 ms
64 bytes from 192.168.208.1: icmp_seq=6 ttl=64 time=13.9 ms
64 bytes from 192.168.208.1: icmp_seq=7 ttl=64 time=23.5 ms
64 bytes from 192.168.208.1: icmp_seq=9 ttl=64 time=36.3 ms
64 bytes from 192.168.208.1: icmp_seq=10 ttl=64 time=24.2 ms
64 bytes from 192.168.208.1: icmp_seq=12 ttl=64 time=9.78 ms
64 bytes from 192.168.208.1: icmp_seq=14 ttl=64 time=12.0 ms
64 bytes from 192.168.208.1: icmp_seq=15 ttl=64 time=8.96 ms
64 bytes from 192.168.208.1: icmp_seq=16 ttl=64 time=9.13 ms
64 bytes from 192.168.208.1: icmp_seq=18 ttl=64 time=11.0 ms
64 bytes from 192.168.208.1: icmp_seq=19 ttl=64 time=11.2 ms
64 bytes from 192.168.208.1: icmp_seq=21 ttl=64 time=17.8 ms
64 bytes from 192.168.208.1: icmp_seq=22 ttl=64 time=9.58 ms
64 bytes from 192.168.208.1: icmp_seq=23 ttl=64 time=12.2 ms
64 bytes from 192.168.208.1: icmp_seq=27 ttl=64 time=13.9 ms
64 bytes from 192.168.208.1: icmp_seq=28 ttl=64 time=14.9 ms
64 bytes from 192.168.208.1: icmp_seq=31 ttl=64 time=30.7 ms
64 bytes from 192.168.208.1: icmp_seq=32 ttl=64 time=10.1 ms
^C
— 192.168.208.1 ping statistics —
32 packets transmitted, 22 received, 31% packet loss, time 31027ms
rtt min/avg/max/mdev = 8.969/15.859/36.373/7.507 ms

32 packets transmitted, 22 received, 31% packet loss, time 31027ms
rtt min/avg/max/mdev = 8.969/15.859/36.373/7.507 ms
root@ml-sc-100:/var/lib/zerotier-one# ping 192.168.152.1
PING 192.168.152.1 (192.168.152.1) 56(84) bytes of data.
64 bytes from 192.168.152.1: icmp_seq=1 ttl=64 time=19.7 ms
64 bytes from 192.168.152.1: icmp_seq=2 ttl=64 time=14.3 ms
64 bytes from 192.168.152.1: icmp_seq=4 ttl=64 time=15.5 ms
64 bytes from 192.168.152.1: icmp_seq=8 ttl=64 time=15.1 ms
64 bytes from 192.168.152.1: icmp_seq=9 ttl=64 time=22.8 ms
64 bytes from 192.168.152.1: icmp_seq=12 ttl=64 time=13.6 ms
64 bytes from 192.168.152.1: icmp_seq=14 ttl=64 time=15.5 ms
64 bytes from 192.168.152.1: icmp_seq=15 ttl=64 time=15.0 ms
64 bytes from 192.168.152.1: icmp_seq=18 ttl=64 time=20.4 ms
64 bytes from 192.168.152.1: icmp_seq=20 ttl=64 time=13.4 ms
64 bytes from 192.168.152.1: icmp_seq=22 ttl=64 time=19.7 ms
64 bytes from 192.168.152.1: icmp_seq=24 ttl=64 time=43.5 ms
64 bytes from 192.168.152.1: icmp_seq=25 ttl=64 time=18.5 ms
64 bytes from 192.168.152.1: icmp_seq=27 ttl=64 time=24.9 ms
64 bytes from 192.168.152.1: icmp_seq=28 ttl=64 time=23.8 ms
64 bytes from 192.168.152.1: icmp_seq=29 ttl=64 time=14.3 ms
64 bytes from 192.168.152.1: icmp_seq=32 ttl=64 time=16.4 ms
64 bytes from 192.168.152.1: icmp_seq=33 ttl=64 time=15.2 ms
^C
— 192.168.152.1 ping statistics —
33 packets transmitted, 18 received, 45% packet loss, time 31991ms
rtt min/avg/max/mdev = 13.451/19.018/43.527/6.897 ms

As you can see, large packet losses occur on both lines. What is the reason for this? It actually contradicts the theory of balance-aware, doesn’t it?
If you now also put some load on the tunnel, like 2-6 MB, the packet runtime also goes up. In the range of 40-60ms (although there is no other traffic on eth9.) In addition, every 5-10 packets goes to a three - four digit runtime. The packet dropouts remain the same.
So something is going wrong

regards

Christian

Something else that is striking. The packet loss does not normalize when there is no load on eth7!

it-service · December 10, 2020, 9:44pm

By the way, this is how it looks with an XOR Config:

PING 192.168.208.1 (192.168.208.1) 56(84) bytes of data.
64 bytes from 192.168.208.1: icmp_seq=1 ttl=64 time=34.5 ms
64 bytes from 192.168.208.1: icmp_seq=2 ttl=64 time=38.0 ms
64 bytes from 192.168.208.1: icmp_seq=3 ttl=64 time=13.3 ms
64 bytes from 192.168.208.1: icmp_seq=4 ttl=64 time=44.5 ms
64 bytes from 192.168.208.1: icmp_seq=5 ttl=64 time=107 ms
64 bytes from 192.168.208.1: icmp_seq=6 ttl=64 time=112 ms
64 bytes from 192.168.208.1: icmp_seq=7 ttl=64 time=9.57 ms
64 bytes from 192.168.208.1: icmp_seq=8 ttl=64 time=11.5 ms
64 bytes from 192.168.208.1: icmp_seq=9 ttl=64 time=10.0 ms
64 bytes from 192.168.208.1: icmp_seq=11 ttl=64 time=10.4 ms
64 bytes from 192.168.208.1: icmp_seq=10 ttl=64 time=1910 ms
64 bytes from 192.168.208.1: icmp_seq=12 ttl=64 time=17.8 ms
64 bytes from 192.168.208.1: icmp_seq=16 ttl=64 time=12.7 ms
64 bytes from 192.168.208.1: icmp_seq=17 ttl=64 time=42.8 ms
64 bytes from 192.168.208.1: icmp_seq=19 ttl=64 time=8.36 ms
64 bytes from 192.168.208.1: icmp_seq=23 ttl=64 time=9.73 ms
64 bytes from 192.168.208.1: icmp_seq=13 ttl=64 time=10533 ms
64 bytes from 192.168.208.1: icmp_seq=14 ttl=64 time=9580 ms
64 bytes from 192.168.208.1: icmp_seq=15 ttl=64 time=8569 ms
64 bytes from 192.168.208.1: icmp_seq=18 ttl=64 time=5712 ms
64 bytes from 192.168.208.1: icmp_seq=20 ttl=64 time=3773 ms
64 bytes from 192.168.208.1: icmp_seq=21 ttl=64 time=2804 ms
64 bytes from 192.168.208.1: icmp_seq=26 ttl=64 time=25.7 ms
64 bytes from 192.168.208.1: icmp_seq=29 ttl=64 time=10.7 ms
64 bytes from 192.168.208.1: icmp_seq=30 ttl=64 time=17.4 ms
64 bytes from 192.168.208.1: icmp_seq=31 ttl=64 time=17.8 ms
^C
— 192.168.208.1 ping statistics —
31 packets transmitted, 26 received, 16% packet loss, time 30094ms
rtt min/avg/max/mdev = 8.360/1670.841/10533.825/3174.591 ms, pipe 11

Here is the config for it:

{
“settings”: {
“defaultBondingPolicy”: “myPolicy”,
“policies”: {
“myPolicy”: {
“basePolicy”: “balance-xor”,
“links”: {
“eth7”: {
“ipvPref”: 4,
“failoverTo”: “eth9”,
“mode”: “primary”,
“enabled”: true
},
“eth9”: {
“ipvPref”: 4,
“failoverTo”: “eth7”,
“mode”: “primary”,
“enabled”: true
}
}
}
},
“peerSpecificBonds”: { “2b400xxxxx”:“myPolicy” },
“peerSpecificBonds”: { “92818xxxxx”:“myPolicy” },
“peerSpecificBonds”: { “50ba9xxxxx”:“myPolicy” }
}
}

regards

Christian

it-service · December 17, 2020, 11:00am

Hello dear support. Does no one have an idea?

regards

Christian

it-service · December 22, 2020, 3:54pm

The basic question is, does the construct work at all with version 1.6.2?
So two WAN connections on one WAN connection.

system · December 29, 2020, 3:54pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.