Network Failure after update

While doing routing updates on my network, we appear to have broken something. I have ZeroTier on my laptop, and 2 pi-zero which act as routers.

The older of the pi-zero was on an old zerotier for some reason, (I think 1.4.x from memory) I have run updates today, and it is now on 1.6.2 however appears to be not connection to the zerotier network correctly.

My 2 other machines are talking through zerotier fine after doing the usual apt-get updates, however they were already on 1.6.2 zerotier.

Can anyone suggest what may be broken by the upgrade?

Any chance we could have the installed OS and version on the older Pi? Also, do you have a spare memory card to see if the older Pi will work fine with a fresh installation? [To eliminate a hardware problem as being the cause.]

Both Pi-Zero are running Raspbian the latest version, however ZeroTier was installed a couple of months apart, hence the update changing version. Unfortunately I don’t have physical access to the failed pi, as it’s about 1000miles away (why is it always the remote end that fails), I do however run Dataplicity on a separate pi incase of incidents like this, so I can ssh into it.

Presumably if I backed up my routes, removed zerotier, then reinstalled and reauthorized, then that would have the same effect. Are there any config files that would need manually removing after uninstalling zerotier??

The configuration files tend to live in the same place in this case, at /var/lib/zerotier-one/ , so if you back this up, and then completely remove it, you’ll be working with a fresh installation. Someone who ran into issues with their configuration covered part of this here:

Okay so I have done the following.

Backed up /var/lib/zerotier-one/
Used apt-get to remove zerotier-one
Removed /var/lib/zerotier-one
Reboot
Run the standard Curl | Bash Install script.
Unauthorised Old id
Authorized New Id
Reboot

Nothing has changed, Both my other machines can still ping each other, but not this machine. And This machine can’t ping anything beyond it’s own zerotier ip address. Attempts to ping either of the other 2 zerotier ips returns “From 10.243.26.180 icmp_seq=23 Destination Host Unreachable” Any ideas?

On the faulty Pi, could you run zerotier-cli status and zerotier-cli listpeers?
It will give an idea of what state zerotier is in, and if it can see anything.
The main thing is to see if you have planets listed by IP (showing it’s talking upstream, and was able to retrieve some info.) Your other machines should appear as leaves, and they should also have IPs, but if they do not they are not able to connect over TCP/IP to establish a ZT link; as would be my understanding.

Okay I run both commands which shows

sudo zerotier-cli listpeers
200 listpeers
200 listpeers 12ac4a1e71 34.94.127.7/34610;6534;6534 -1 1.6.2 LEAF
200 listpeers 3a46f1bf30 185.180.13.82/9993;7043;1750 369 - PLANET
200 listpeers 62f865ae71 50.7.252.138/9993;7044;1665 442 - PLANET
200 listpeers 6a8d968b28 - -1 1.6.2 LEAF
200 listpeers 778cde7190 - -1 - PLANET
200 listpeers 992fcf1db7 195.181.173.159/9993;7046;1763 444 - PLANET

sudo zerotier-cli status
200 info cd45b32f63 1.6.2 ONLINE

And this is the output from a working machine

sudo zerotier-cli listpeers
200 listpeers
200 listpeers 12ac4a1e71 34.94.127.7/34611;11549;11377 168 1.6.2 LEAF
200 listpeers 3a46f1bf30 185.180.13.82/9993;6534;1369 161 - PLANET
200 listpeers 62f865ae71 50.7.252.138/9993;6535;1320 223 - PLANET
200 listpeers 6a8d968b28 192.168.1.91/9993;6535;6524 9 1.6.2 LEAF
200 listpeers 778cde7190 - -1 - PLANET
200 listpeers 992fcf1db7 195.181.173.159/9993;6537;1497 36 - PLANET

Okay, so if we look at 6a8d968b28, we can see it is visible on a working machine, and not contactable on the unit which is not working. As it [6a8d968b28] is using an internal address, we can assume a firewall is blocking traffic to 6a8d968b28 (which is using ports 9993, 6535 and 6524.) This suggests a NAT on the router for the working machine blocks access to it, however, zerotier should attempt to route traffic via a planet when this happens; and I believe it is doing that.

Notice the cd45b32f63, that is the node id of the PI in trouble. Now, notice how it is not listed as a peer on the working machine. In that situation, the controller is not providing the details to your working machine, and the working machine does not even know it can contact cd45b32f63. You know, I wonder if your working machine is using cached details, and they have not been updated.

So, backup the config for the working machine, and install zerotier fresh. See if it makes a difference, and if not, you can restore the backup files. It’s worth a try, but it appears to be the bit which stands out to me.

Glad you understood those better than I did :slight_smile:

Right I have done a re-install on the working pi, and unfortunately nothing has changed. Both the working pi, and my laptop can still see each other (my laptop is still on the original working setup) but neither can be seen from the non-working pi, and the non-working pi can see neither.

I’m at a loss on this one.

Okay, let’s try something a little different, make another network (you can keep the old one, so no issues there) and join all the devices to it. Sometimes that works for people, and I’m pretty sure it’d rule out a controller comms issue.

Keep in mind, if ports are not open and forwarded to devices on a network, then this can cause issues; in case your laptop and the working pi are both inside the same network, and this is why they are working.

(Late over this end, so this’ll be the last reply for tonight.)

Okay, well I setup a second network, and it behaved exactly the same, so I then as a temporary test ssh into a different pi on the same remote network, and installed zerotier, that works fine and as expected. So my interpretation at the moment, is that the zerotier network is working fine, nethier of the local networks are blocking access in anyway. The problem has to be somewhere on the pi-zero after the upgrade. I’ll start by removing all the ip-routes, if that works I’ll build them back up step by step.

I know you’ve probably long since given up for tonight, but I’ll keep reporting back here as it helps keep my thoughts straight aswell, and your persistent help is much appreciated. Thanks.

Don’t you love networks…

Just logged into all 3 machines to make a note of the routes, and run test pings. Wouldn’t you believe it, everything is working fine.

Lol, so it looks like they caught-up on the second network :wink: It’s weird, but I’ve found that sometimes works. Keep in mind that sometimes they can take a while until they connect fully, with no rhyme or reason for it.

And once the second network was working, the first started working, so I could completely remove the second.

Once again thanks for the help.

Hmm, this has happened again, I’m wondering if I’m expecting a little too much from the pi-zero.

This time I’ve been a little quicker to spot the problem and it appears that the processor load is only 0.3 however the cpu is running at a flatout 100%, unfortunately at the moment I can’t ssh in to reboot it.

The option is, that I am running zerotier on my laptop, and my laptop is also on a network that is shared with zerotier, and I’m wonder if the 2 possible routes to my laptop is causing some sort of loop\conflict.

I have un-authorised the remote machine in zerotier to see if it will settle down, then I’ll try connecting from just my laptop, and not the remote network. that should rule out the loop\conflict theory. If that doesn’t solve it, I’ll upgrade it to a 3b+ but that will have to wait till I’m back on site.

You can get network loops, but I’ve only experienced that when a Zerotier network was bridged onto a network where ZT clients to the same network also existed. In that case, it caused the majority of clients to get booted due to the loop.

Keep in mind, I have been wondering if the reason things came right when the second network was on, was because all the machines were able to sync the details of the first network over the second network.

Okay, well I got things reconnected after numerous, attempts. However this time I have stayed with just 2 devices, the remote device and my laptop. I’m going to leave it like that for a few days, to see if remains stable. Then add the third.

I do currently have a second network setup, but have un-authorised all the devices, I assume this means that it won’t interfere. I notice that I can ping the zerotier ip of the local interface, but not of any other interfaces using this method. My third device on the original network is also still running but un-authorised with the same assumption.

I can report that this has been running stable since, I disabled the zerotier network sharing locally.

Not entirely sure how to proceed, but it appears that having an entire network shared over zerotier, and then also having a device on that network that is also connected to the zerotier network, is the cause of the problem. For the timebeing I have disabled network share on the network that I am on. In theory I can switch networks when I am at another location.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.