Asymmetric Routing Problem with Site to Site (SOLVED - misconfiguration)

I have a routing problem with trying to set up a test site-to-site configuration. I seem to be missing a piece somewhere but have come up blank trying to find it and in inquiries in various forums.

The scenario

Two networks with internet connectivity via routers providing NAT, lets call them site A (172.16.1.0) and site B (192.168.50.0), zerotier site-to-site setup as described in the details below.

From Site B, from a test node I can ping all the systems on Site A and all intermediate devices - Site B ZT gateway, Site A ZT gateway, test node residing on Site A network. I can also ssh into the test node on Site A from the test node on site B.

From Site A, from a test node I can ping the local ZT gateway, the Site A ZT client address, the Site B ZT client address, but not the Site B ZT gateway local network address or other nodes on Site B. From the Site A ZT gateway, I can ping everything and connect to test nodes on the Site B network.

Based on this behavior, I am thinking there is something wrong with the routing on the Site B gateway, but I have tried to keep everything as symmetric as possible. Im stumped - presumably it is something obvious.

Any thoughts?

Detailed Scenario

Two networks with internet connectivity via routers providing NAT, lets call them site A (172.16.1.0) and site B (192.168.50.0).

On each network, I have setup a gateway node with the zerotier client installed, on Site A this is a Pi4 (172.16.1.13), on Site B it is an Ubuntu VM (192.168.50.28).

I have a zero tier network created and both of the clients have joined this network.

Site A gateway client has zerotier address 172.30.0.130
Site B gateway client has zerotier address 172.30.0.14
I have managed routes defined in the zerotier network as follows
172.16.1.0/24 via 172.30.0.130
192.168.50.0/24 via 172.30.0.14
172.30.0.0/16 (LAN)

Site A router has the zerotier gateway node (172.16.1.13) defined as a gateway, and routes defined using the gateway (this is a pfsense box, so the gateways get defined separately from the routes)
Network 192.168.50.0/24 Gateway 172.16.1.13
Network 172.30.0.0/24 Gateway 172.16.1.13

Site B router has the following routes defined
172.16.1.0/24 gateway 192.168.50.28
172.30.0.0/24 gateway 192.168.50.28

Site A gateway has IP forwarding enabled, and the following rules.v4
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A POSTROUTING -o eth0 -j MASQUERADE
COMMIT

*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A FORWARD -i eth0 -o -m state --state RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i -o eth0 -j ACCEPT
COMMIT

Site B gateway has IP forwarding enabled, and the following rules.v4
*nat
:PREROUTING ACCEPT [81:73389]
:INPUT ACCEPT [72:15727]
:OUTPUT ACCEPT [29:4502]
:POSTROUTING ACCEPT [38:62164]
-A POSTROUTING -o eth0 -j MASQUERADE
COMMIT

*filter
:INPUT ACCEPT [13202:1688754]
:FORWARD ACCEPT [15997:35715961]
:OUTPUT ACCEPT [47852:38955960]
-A FORWARD -i eth0 -o -m state --state RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i -o eth0 -j ACCEPT
COMMIT

Are the clients linux? There’s this note from the full tunnel article:

Linux’s networking stack is complex and almost absurdly feature-rich. This is a good thing and a bad thing. You can do almost anything with it, probably including but not limited to IP over avian carrier. But it also has a lot of weird little edge cases that can bite.

For a Linux host to route via a ZeroTier network, you may (depending on distribution) need to change a setting called rp_filter:

sudo sysctl -w net.ipv4.conf.all.rp_filter= 2

RedHat has an article explaining the details of this. Put it in /etc/sysctl.conf to make it permanent.

Oddly enough this is not required on the gateway/router, only participating members running Linux that want to enable allowDefault .

Just to be absolutely clear, you are meaning on the two devices I have connected to the ZT network that are (supposed to be) routing between my private IP spaces and ZT? I tried that setting and it did not seem to make any difference.

I did notice one other extremely odd behavior (ie I don’t understand it at all). Here is a traceroute from the test node on Site A (172.16.1.0) to the main router located at Site B (192.168.50.0)

$ traceroute 192.168.50.1
traceroute to 192.168.50.1 (192.168.50.1), 30 hops max, 60 byte packets
1 172.16.1.13 (172.16.1.13) 0.234 ms 0.215 ms 0.203 ms
2 172.30.0.14 (172.30.0.14) 6.907 ms 6.929 ms 6.943 ms
3 192.168.50.1 (192.168.50.1) 7.256 ms 7.272 ms 7.290 ms
$

The trace correctly gets the ZT gateway from the Site A router (ie 172.16.1.13), then traverses the ZT network to the gateway on the Site B network (ZT IP 172.30.14) and thence to the target… Looking good.

If I repeat the same test except targeting a different node on the Site B network, eg 192.168.50.250, I get a single hop to the Site A router, and then the traceroute ends - almost like something in the Site A router config is not quite right.
ie
$ traceroute 192.168.50.250
traceroute to 192.168.50.250 (192.168.50.250), 30 hops max, 60 byte packets
1 172.16.1.1 (172.16.1.1) 0.299 ms 0.265 ms 0.205 ms
$

What is confusing me here, and is almost certainly my lack of understanding of network topology is that pinging from Site B hosts to Site A hosts works perfectly fine, so there is definitely some route back.

Further clarification on host type.

Local/ZT gateway at Site A is running Rasbpian Buster (on Pi)
Local/ZT gateway at Site B is running Ubuntu 20.04 VM (Hosted On Ubuntu on amd64).

Thank you for the suggestions. I finally determined what the problem was. The router at Site A had rules defined by a previous user that had survived a reset, one of those was blocking my ping attempts.

1 Like

Thanks! glad you got it working.