Connection fine for weeks, suddenly so slow it isn't usable

I have a Raspberry PI 5, running NodeRed on stock raspberry PI OS, deployed on a site.
The site, and my home for that matter, are both on 4G internet connections (UK).

The RPI, my lappy and a tablet have all been using a Zerotier network for weeks and preforming very well when viewing the NodeRed dashboard or the PI Desktop, from either machine. In addition there are another couple of devices, different people, occasionally looking at this same site.

2 days ago there was a prolonged power cut and since then the connection has been so slow as to be all but unusable. I did manage to get VNC to load and ran an internet speed test from the PI, eventuallyā€¦ It reported well over 20mbps and the occupier of the site said his internet was fine, subjectively, HD videos were working and web pages loaded quickly.

I have since been to the site and the PI is fine, as is the internet connection. I added ā€˜Raspberry connectā€™ so I have another way to access it, avoiding the CGNAT issue and that seems fine too.

However, if I try to view NodeRed in a browser from anywhere outside the sites LAN, using the Zerotier network, it is still crazy slow!

Is there anywhere I can look at logs/errors? What could be causing this? As far as I am aware its a basic stock setup with all authorised devices manually assigned different IPā€™s and no routing enabled! Moreover its been great for weeks and other than a power outage for about 12 Hrs, nothing has changed.
If I am on the site, plugged into the LAN, a direct connection is fine and so is the Zerotier network connection but I am assuming that is because the two devices are instructed to directly communicate by the Zerotier connectors.

I have a basic working knowledge of networking but nowhere near the knowledge/skill necessary to even understand how Zerotier is working when it works, let alone to try and find out why it isnā€™t working.

Please help if you can, I had just about decided to move all my sites to a paid for Zerotier network/s, because testing has been so solid, but this has caused me significant issues.

The chat bot tells me I will need a commercial account before there is any support, that makes evaluating support somewhat hard to do!

This is the first issue I have had, but its a game changer without an explanation!

I have literally just asked a UK company to look at designing an ARM based board to run Linux / docker with, amongst other things, a Zerotier appā€¦ The goal is remote site monitoring and control that will work anywhere with an internet connection and require no local network configuration.

I realise giving direct support to everyone with a free account isnā€™t realistic but I am hoping that something this odd, and fundamental, can be addressed, if only to give me confidence to move froward with a commercial rollout, all be it a modest one!

Al

Hmmm - always difficult to troubleshoot these kinds of problems. The first place to check would be ā€œsudo zerotier-cli peersā€ to ensure that you are not going through a relay. You should see your network participants in the resulting list as LEAF entries. If any are showing up as RELAY thatā€™s going to kill your performance and you might need to check up on any port forwarding configuration that was done prior to the power outage and make sure that they survived and/or got updated correctly (assuming UPNP for example)

Iā€™ve been having really slow connections lately as well. Have not seen this issue during the past couple of years using the service, so Iā€™m also curious to know if anyone knows if this is a bug, a temporary issue or something more serious.

Itā€™s been like this for at least a few weeks now.

There was no network configuration done on the site at all so I am 100% sure that nothing changed prior to the power outage, or after it for that matter.
The area is rural so it is likely that cell service is from a mast shared by multiple providers and probably the only one in range so it is unlikely, although not impossible, that the external connection has changed either. I will jump on the PI now and run the command you suggestā€¦

image


The last LEAF, whatever that is, is me, so bad I am guessing!

Pinging back the other way, to my lappy, looks just as badā€¦
ZT3

Internet speed test at my place and from the PIā€¦
ZT4
ZT5

Yeah - everything except one device is being relayed so thatā€™s going to suck for performance. There may have been some back end changes at the ISP side that are putting you behind a double or triple NAT configuration. Worth looking into TCP Relay | ZeroTier Documentation or possibly better: Route between ZeroTier and Physical Networks | ZeroTier Documentation

Thanks I will have a look at those. No UK mobile networks offer fixed public IP addresses and all use multiple NAT layers so inbound routing simply isnā€™t possible unless you buy into private APN service, which are readily available but typically have significant data charges when compared to those offered directly by the carriers. I pay around Ā£25/month for unlimited data at home. Several of the sites I service have 4G connections with public IP addresses, so we can route in, but the service we use charges circa Ā£12/GB/month, more if you go over the data allowance, which seems fairly typical.

Same here, ZT so slow it is not usable for RDP connection. RDP to other Computer with Tailscale in the same Network works fine.

1 Like

+1, ZT is wonky at the moment. I can barely access MikroTik web config (webfig) through ZT, it hangs 9/10 times. 20 tries later, I managed to wake my home PC and RDP to it, worked perfectly. Then, all of a sudden RDP times out or breaks after 10 seconds. No updates were performed, config wasnā€™t touched for months. Fingers crossed this doesnā€™t last longā€¦

Agree, ZT was fine until last night. Been wonky all morning. Very low speed and disconnects. Something is going on.

So glad to see Iā€™m not the only one.

Iā€™m experiencing really bad connection quality (40-60% ICMP packet loss, TCP retransmissions) between all my ZT-enabled nodes as well. This is making the nodes practically unreachable, and has been happening for about 1.5 weeks now. Previously it had been working flawlessly for a few years at least (ofc with minor, negligible exceptions). It just happened overnight. I havenā€™t changed anything in my infra setup.

I am facing the same problem, it was good, suddenly so slow for week now

Its been about a week since ZT became extremely slow on one my computers. I have a couple of other PCs that are connecting and responding as usual. The only one that seems to have developed this issue is running Win 7. It is not an ISP issue since two of the computers I am using are on the same network, I can connect to them both remotely but only one (with Win 7) has the speed issue which is a recent development.

Me Too. We usually put a ā€œno remapā€ outbound NAT policy on the edge firewall to prevent agents from getting stuck in relaying, but this seems not to be working recently. I have tested on multiple networks with multiple devices and no matter what, I am relaying.

More threads about this as well:

1 Like

More testing on this. I literally have to give my laptop a public IP in order not to be relaying.
If I leave this network [was created 3 weeks ago] and join one which was created >12 months ago, it works fine with NAT.
In other words, this issue is nothing to do with my end, but must be something to do with how ZT servers detect NAT.

3 Likes

Multiple devices using Zerotier from that network? UPNP might be having a mapping conflict for the port to map to the internal IP. Not sure how that interaction works, but the old one may be pointing 9993 to your laptop and the second network has been assigned a different port number which UPNP is not picking up.

No UPNP in use on any of these networks. Itā€™s not just me, look at all the other posts about this.

Agreed that there are lots of posts, and realizing this is not just you. Just a comment from the description of your symptoms and results (public IP OK, different zerotier networks give different results from behind routers)

From what I understand of the internals of Zerotier, performance issues are (usually) the result of having to go through a relay. The situations where a relay is required appear to be when you canā€™t get direct UDP traffic, generally the result of multiple NAT layers. With a direct internet IP address, thereā€™s no issue (firewalls permitting).

If thereā€™s one layer of NAT (the router actually has an internet IP and is not behind some other CGNAT layer), UPNP/NAT-PMP or manual forwarding redirects the UDP traffic to the correct internal IP. Checking on my RouterOS and pfSense installs, I can see these mappings happening and the clients are clearly available with direct connections and not going through relays.

If there are multiple layers of NAT then connectivity has to go through a relay since thereā€™s no way to direct the UDP traffic end-to-end between network participants. The problem that I have seen is that a lot of ISPs are moving to CGNAT since they donā€™t have enough internet IPv4 addresses available to give to all their clients so you may not have changed anything but the uplink connectivity is now assigning addresses in a private address space rather than an internet IP.

Itā€™s worth spending some time looking at the results of the ā€œzerotier-cli peersā€ command from different end points and compare the results against your router IPs and any UDP mapping on the routers.

ā€¦ just an idea:

  • Has anybody tried to allow IPv6 on the network and try to ping through that?
1 Like