Bonds + Multipath

I’ve been looking @ github and partially answered my own question …

		if (arg2 == "enable") {
			fprintf(stderr, "zerotier-cli bond <peerId> enable\n");
			return 0;
		}

it appears that all commands except show and listing bonds is still currently not implimented from CLI…

The question therefore becomes … how exactly can I get this working via JSON ?

I’m starting to think i’ve missed something fundamental here, but not sure how or what…

I have an OPNSense router with two wan connections (ue0 and ue1), both are effectively identical, behind the OPNsense network are a few client machines that need access to IP’s behind my remote server (debian) which has a single wan (very fast) connection (enp34s0).

my ‘dream’ solution here is having a machine on local LAN with Zerotier client (gigabit lan connection) connecting into the Zerotier SD-WAN, and using OPNSense to then relay via a load balanced uplink to the remote server.

in my setup are (for now) 3 machines:
435365ab46 (OPNsense, dual wan)
64198ff193 (Lan client, behind OPNSense)
8ed670dc1b (Remote server)

From how I have read everything, for the first phase (just getting a bonded connection to remote server) however, even after a switch to 1.6.1 the bonding commands are not working as in the documentation?

root@OPNsense:~ # zerotier-cli status
200 info 435365ab46 1.6.1 ONLINE
root@OPNsense:~ # zerotier-cli bond 8ed670dc1b enable
zerotier-cli bond <peerId> enable
root@OPNsense:~ #

root@fi ~ # zerotier-cli status
200 info 8ed670dc1b 1.6.1 ONLINE
root@fi ~ # zerotier-cli bond 435365ab46 enable
zerotier-cli bond <peerId> enable
root@fi ~ #

I have also (on OPNSense side) tried with a local JSON’d configuration:

{
    "settings": {
        "defaultBondingPolicy": "myPolicy",
        "myPolicy": { "balancePolicy": "flow-dynamic" },
		"policies": {
			"myPolicy": {
				"links": {
					"ue0": {
						"ipvPref": 4,
						"failoverTo": "ue1",
						"mode": "primary",
						"enabled": true
					},
					"ue1": {
						"ipvPref": 4,
						"failoverTo": "ue0",
						"mode": "primary",
						"enabled": true
					}
				}
			}
		},
	    "peerSpecificBonds": { "8ed670dc1b":"myPolicy" }

    }
}

from frankensteining various parts of documentation… and zerotier-cli simply hangs after it’s added (so something obv wrong in my configuration)

Any advice or suggestions?

Just bumping this back up a little…

any thoughts on this, or any ‘how to’ for non cli multipath implimentation? … with the inability to create a bond via cli, the only other way is via config file (JSON) which I’ve attempted… if someone points me in the right direction I will happily work at it until it works… if need be I’ll do a mammoth session tomorrow going through code to see how/where it impliments but that is massively more work than a sample config.

It’s blatant people have got multipath working as far back as 1.5.0, and it’s also evident that it’s about to get pretty simple… but with the cli tool not having working imlimentation to create a bond, only query one, and the only documentation being for an implimentation that is not yet public or aspirational?

once I get this working, I’ll happily try to help others… but at the moment, no starting point is a very steep learning curve :wink:

Hello @AHarris,

Thank you for testing this out and reporting the issue. The CLI is indeed still maturing and its functionality lags the documentation somewhat. Sorry for the inconvenicence!

First, I am interested in knowing more about the crashing (what platform?), I’ve tried your config and got the following on stderr:

error: no base policy was specified for custom policy (myPolicy)
error: custom policy (myPolicy) is invalid, unknown base policy ().
error: unknown policy (myPolicy) specified by defaultBondingPolicy, link disabled.

For me, ZT did not crash, and these errors are expected since the custom policy myPolicy doesn’t have a basePolicy specified. This is required because ZT needs to know what family of behaviors to give this bond. I would suggest using balance-xor for now as this will hash protocol flows across both of your links.

Additionally, with balance-xor the default failover behavior will be to failover to remaining links so it’s possible that you can omit that unless there are other links you definitely do not want to be used.

I would try something like the following:

{
    "settings": {
        "defaultBondingPolicy": "myPolicy",
        "myPolicy": { "basePolicy": "balance-xor" },
		"policies": {
			"myPolicy": {
				"links": {
					"ue0": {
						"ipvPref": 4,
						"failoverTo": "ue1",
						"mode": "primary",
						"enabled": true
					},
					"ue1": {
						"ipvPref": 4,
						"failoverTo": "ue0",
						"mode": "primary",
						"enabled": true
					}
				}
			}
		},
	    "peerSpecificBonds": { "8ed670dc1b":"myPolicy" }
    }
}

I’ll keep an eye on this thread so definitely report back if this doesn’t solve your problem!

i’ve also been struggling with this all night. ive tried many many local.conf configurations and also noticed the bond command not working. i run this in headless mode in a debian 10 lxc. i need bonding to work with my two pppoe connections. the bond doesnt start automatically and there’s seemingly no way to start it.

So, i’ve tried your config, and I continue to get no policy:

root@OPNsense:/var/db/zerotier-one # /usr/local/sbin/zerotier-one error: no base policy was specified for custom policy (myPolicy) error: custom policy (myPolicy) is invalid, unknown base policy (). error: unknown policy (myPolicy) specified by defaultBondingPolicy, link disabled. ^C root@OPNsense:/var/db/zerotier-one #

if I take it to it’s simplest level:

{
    "settings": {
        "defaultBondingPolicy": "balance-xor",
        "peerSpecificBonds": { "8ed670dc1b":"balance-xor" }
    }
}

it starts normally… but … without a bond anyway.
root@OPNsense:/var/db/zerotier-one # /usr/local/sbin/zerotier-one -d
root@OPNsense:/var/db/zerotier-one # ztagim5o457groo

root@OPNsense:/var/db/zerotier-one # zerotier-cli listbonds
    <peer>                        <bondtype>    <status>    <links>
      NONE                              NONE        NONE       NONE
root@OPNsense:/var/db/zerotier-one #

… (this is back to where I started essentially, and why I thought I was going a bit nuts) … as it’s essentially what I’ve done initially and tried to grow from…

Also a (possible) bug with tap interfaces… zerotier often exits without properly cleaning it’s tap device(s) … so on next start is creating tap10000 etc…

root@OPNsense:/var/db/zerotier-one # zerotier-one
ifconfig: ioctl SIOCSIFNAME (set name): File exists
ERROR: unable to configure virtual network port: ifconfig rename operation failed

… this is pretty much only resolvable by stopping zerotier and unloading the if_tap ko on the server … .

root@OPNsense:/var/db/zerotier-one # ifconfig | grep zt
ztagim5o457groo: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 5000 mtu 2800
root@OPNsense:/var/db/zerotier-one # ifconfig | grep tap
groups: tap
tap9994: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
groups: tap
root@OPNsense:/var/db/zerotier-one # zerotier-one
ifconfig: ioctl SIOCSIFNAME (set name): File exists
ERROR: unable to configure virtual network port: ifconfig rename operation failed
^Croot@OPNsense:/var/db/zerotier-one # zerotier-one
ifconfig: ioctl SIOCSIFNAME (set name): File exists
ERROR: unable to configure virtual network port: ifconfig rename operation failed
^Croot@OPNsense:/var/db/zerotier-one # ifconfig | grep zt
ztagim5o457groo: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 5000 mtu 2800
root@OPNsense:/var/db/zerotier-one # ifconfig | grep tap
groups: tap
tap9994: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
groups: tap
tap9995: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
groups: tap
tap9996: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
groups: tap
root@OPNsense:/var/db/zerotier-one #
root@OPNsense:/var/db/zerotier-one # kldunload if_tap
root@OPNsense:/var/db/zerotier-one # kldload if_tap
root@OPNsense:/var/db/zerotier-one # zerotier-one
ztagim5o457groo
^Croot@OPNsense:/var/db/zerotier-one #

So, with considerable shuffling, and much core-dumping …

I can get a bonded link up showing as healthy, by ONLY using default policy…

if I want to be able to ‘listbonds’ on the BSD side, I need to make sure I use balance-aware (balance-xor still hangs).

Currently, any config more complex than:>

{
“settings”:
{
“defaultBondingPolicy”: “balance-aware”,
“balance-aware”: {
“allowFlowHashing”: true,
“rebalanceStrategy”: “aggressive”
}
}
}

is a recipie for either a core dump or simply refusing to apply… 99% of the time on the FreeBSD (12.1) end.

However, even with the both ends showing the other as a bond:
> root@OPNsense:/var/db/zerotier-one # uname -a

FreeBSD OPNsense.localdomain 12.1-RELEASE-p10-HBSD FreeBSD 12.1-RELEASE-p10-HBSD #0  6e16e28f1bf(stable/20.7)-dirty: Tue Oct 20 13:30:19 CEST 2020     root@sensey64:/usr/obj/usr/src/amd64.amd64/sys/SMP  amd64
root@OPNsense:/var/db/zerotier-one # zerotier-cli listbonds
    <peer>                        <bondtype>    <status>    <links>
8ed670dc1b                     balance-aware     Healthy        2/2
root@OPNsense:/var/db/zerotier-one #

root@fi /var/lib/zerotier-one # uname -a
Linux fi 4.19.0-10-amd64 #1 SMP Debian 4.19.132-1 (2020-07-24) x86_64 GNU/Linux
root@fi /var/lib/zerotier-one # zerotier-cli listbonds

3a46f1bf30 balance-aware Degraded 1/2
435365ab46 balance-aware Healthy 2/2
62f865ae71 balance-aware Degraded 1/2
64198ff193 balance-aware Healthy 2/2
778cde7190 balance-aware Degraded 1/2
992fcf1db7 balance-aware Degraded 1/2
root@fi /var/lib/zerotier-one #

Linux is showing both BSD + Windows (behind BSD) as balance-aware (is balance enabled by default in windows client?) + several zerotier planets as degraded; BSD Is showing ONLY Linux… and on testing… no balancing is happening regardless (wan1 is showing a spike in throughput when I pull a file from Linux box, but wan0 is still at idle)… and speed is consistent with prior to a bond being created.

so it’s clucking like a chicken, but it’s not a chicken yet :wink:

this doesn’t give me a policy error (but still doesnt create any bonds)

{
“settings”: {
“defaultBondingPolicy”: “myPolicy”,
“myPolicy”: { “basePolicy”: “balance-xor” },
“policies”: {
“myPolicy”: {
“basePolicy”: “balance-xor”,
“links”: {
“ppp0”: {
“ipvPref”: 4,
“mode”: “primary”,
“enabled”: true
},
“ppp1”: {
“ipvPref”: 4,
“mode”: “primary”,
“enabled”: true
}
}
}
},
“peerSpecificBonds”: { “509688e656”:“myPolicy” }
}
}

i also can’t ever create a bond even with the most basic configurations. so you’re doing better than i am

do you have bonding enabled on both ends ? … if you’ve only set a bonding policy on one end it doesn’t seem to negotiate with the remote.

In my configuration as i’ve gone back to basics, i’ve got the same on both ends.

no, but i only have one wan connection on the “server” side. though i have tried adding a simple bonding policy on the other end also with no luck

Ok guys, thanks again for working with me on this.

When I modified your snippet I failed to notice the myPolicy definition was in the wrong place. It should be defined in the policies section like so:

{
    "settings": {
        "defaultBondingPolicy": "myPolicy",
        "policies": {
                "myPolicy": { "basePolicy": "balance-xor" }
        }
    }
}

If that works, go ahead and try the rest of your config modified like so:

{
    "settings": {
        "defaultBondingPolicy": "myPolicy",
        "policies": {
            "myPolicy": {
            	"basePolicy": "balance-xor",
				"links": {
					"ue0": {
						"ipvPref": 4,
						"failoverTo": "ue1",
						"mode": "primary",
						"enabled": true
					},
					"ue1": {
						"ipvPref": 4,
						"failoverTo": "ue0",
						"mode": "primary",
						"enabled": true
					}
				}
			}
		},
	    "peerSpecificBonds": { "8ed670dc1b":"myPolicy" }
    }
}

I’ll do some testing on FreeBSD and see if I can recreate those crashes.

i didnt realize the listbonds only shows up on the OTHER side of the connection. i’m showing 2/2 healthy bond connections from the proper ip addresses of each of my ppp devices. however they are not aggrigating speed. i’ve tried balance-rr and balance-aware so far

@sshanee Depending on the protocol you’re running balance-rr might actually make your performance worse since it will break up TCP streams. I’d use balance-xor for now if your traffic is any protocol that is sensitive to segment reordering.

Also, for your setup do you have multiple links on each end?

i have two dsl modems at my home in a debian 10 lxc and the other end is a vps with a single 250mbit symetrical connection. all of the data seems to go through ppp1 which is the second link i specified in local.conf and it also was the default route before zerotier starts. my home side doesn’t show any bond, that only shows up on the vps side

{
“address”: “xxxxxxxxxx”,
“bondingPolicy”: 4,
“isBonded”: true,
“isHealthy”: true,
“latency”: 0,
“numAliveLinks”: 2,
“numTotalLinks”: 2,
“paths”: [
{
“active”: true,
“address”: “xxx.xxx.xxx.xxx/9993”,
“expired”: false,
“lastReceive”: 1606762893695,
“lastSend”: 1606762893703,
“preferred”: true,
“trustedPathId”: 0
},
{
“active”: true,
“address”: “xxx.xxx.xxx.xxx/9993”,
“expired”: false,
“lastReceive”: 1606762893039,
“lastSend”: 1606762893009,
“preferred”: false,
“trustedPathId”: 0
}

I am getting similar … it is using it’s prefered wan connection (wan1), however the actual router is very happily load balancing for general traffic behind it…

the bond is showing 2 healthy links…
local → remote is showing it bonding to two physical IP’s allocated on the server (single pipe)
remote → local is showing it bonding to two physical IP’s allocated on two distinct pipes

“address”: “435365ab46”,
“bondingPolicy”: 5,
“isBonded”: true,
“isHealthy”: true,
“latency”: 0,
“numAliveLinks”: 2,
“numTotalLinks”: 2,
“paths”: [
{
“active”: true,
“address”: “x.x.x.x/9993”,
“expired”: false,
“lastReceive”: 1606765459437,
“lastSend”: 1606765459384,
“preferred”: true,
“trustedPathId”: 0
},
{
“active”: true,
“address”: “x.x.x.x/46952”,
“expired”: false,
“lastReceive”: 1606765459428,
“lastSend”: 1606765459384,
“preferred”: false,
“trustedPathId”: 0
}
{
“address”: “8ed670dc1b”,
“bondingPolicy”: 5,
“isBonded”: true,
“isHealthy”: true,
“latency”: 0,
“numAliveLinks”: 2,
“numTotalLinks”: 2,
“paths”: [
{
“active”: true,
“address”: “x.x.x.x/9993”,
“expired”: false,
“lastReceive”: 1606765804143,
“lastSend”: 1606765803642,
“preferred”: false,
“trustedPathId”: 0
},
{
“active”: true,
“address”: “x.x.x.x/9993”,
“expired”: false,
“lastReceive”: 1606765804139,
“lastSend”: 1606765803638,
“preferred”: true,
“trustedPathId”: 0
}

is showing the output from zerotier-cli listbonds -j that corresponds to the bond thats been created… again, testing if there is bonding (wget http://linux-server/testfile) shows data across a single wan connection only

Using one instance of wget on a balance-aware config probably won’t amount to much. As far as I know wget will create a single TCP stream. I would not advise splitting a TCP stream across multiple links because segment reordering will cause the TCP stack to suspect something is wrong and throttle itself back. You’ll get worse performance.

Let’s try something simpler like iperf with multiple streams using balance-xor. Once we can see it using both of your links we can start working from there.

i was using multi setting on speedtest.net and torrents. it’s only using 1 ppp connection in my case

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.