Unstable connection

Hi there !
I share ZeroTier between a few computers. Sometimes those computers share the same internal network, sometimes not. That’s one of the reason we use ZeroTier.

Sometimes, the connection becomes really unstable. I know it isn’t an issue in my physical network. How ? If I ping a machine on the network through its ZT IP address, I have many timeouts :

Request timeout for icmp_seq 3
64 bytes from 10.*.14: icmp_seq=3 ttl=64 time=1007.520 ms
64 bytes from 10.
.14: icmp_seq=4 ttl=64 time=8.545 ms
Request timeout for icmp_seq 6
Request timeout for icmp_seq 7
Request timeout for icmp_seq 8
Request timeout for icmp_seq 9
64 bytes from 10.
.14: icmp_seq=7 ttl=64 time=3159.897 ms
64 bytes from 10.
.14: icmp_seq=8 ttl=64 time=2158.794 ms
64 bytes from 10.
.*14: icmp_seq=9 ttl=64 time=1157.515 ms

If I ping the same machine through the local network everything works fine :

64 bytes from icmp_seq=0 ttl=64 time=2.206 ms
64 bytes from icmp_seq=1 ttl=64 time=1.730 ms
64 bytes from icmp_seq=2 ttl=64 time=84.514 ms
64 bytes from icmp_seq=3 ttl=64 time=1.744 ms
64 bytes from icmp_seq=4 ttl=64 time=2.249 ms
64 bytes from icmp_seq=5 ttl=64 time=1.113 ms
64 bytes from icmp_seq=6 ttl=64 time=71.381 ms
64 bytes from icmp_seq=7 ttl=64 time=5.673 ms
64 bytes from icmp_seq=8 ttl=64 time=2.375 ms
64 bytes from icmp_seq=9 ttl=64 time=25.289 ms

I have absolutely no clues why my ZT network is unstable. We have only 5 machines registered on the VPN (one of them is offline now), default configuration, no tweaks at all. The only thing is that I can’t manage the router of the network (that’s the main. reason I do use ZT) .

I have checked https://zerotier.atlassian.net/wiki/spaces/SD/pages/6815768/Router+Configuration+Tips and I don’t have double NAT.

I have this problem between each of my machines, not only two of them.

I have found some people with network issues but each time they had to restart their vpn or something like this. In my case, the network is just unstable, for a few seconds, at random times. Maybe it’s because of some congestion in ZT servers ? Maybe something else.

Any Idea ?

When you do zerotier-cli peers does it say DIRECT or RELAY? Does the ip address under path make sense?
3 second ping time means it’s either relaying through something very far away, or the CPU is too busy.

I would have thought it would be a NAT setting. Is your internet connection also stable? 3 seconds ping is of the charts high must say. I’m getting about 350ms on a few networks and also 60ms on a different network.

Thank you for your answers.

@zt-travis : Each connexion is direct except for one computer which is offline :

<ztaddr>   <ver>  <role> <lat> <link> <lastTX> <lastRX> <path>
3a46xxxxxx -      PLANET   156 DIRECT 4218     4064     185. xxxxxxxxxxxx/9993
5b90xxxxxx 1.4.6  LEAF       1 DIRECT 14227    14225    2a01: xxxxxxxxxxxx:fad2:93bb/9993
62f8xxxxxx -      PLANET   254 DIRECT 4218     3963     2001: xxxxxxxxxxxx:2::2/9993
6fb4xxxxxx 1.4.6  LEAF      -1 RELAY
778cxxxxxx -      PLANET   118 DIRECT 4218     4100     2605: xxxxxxxxxxxx:f2bc:a1f7:19/9993
8286xxxxxx 1.4.8  LEAF     154 DIRECT 9222     9069     34. xxxxxxxxxxxx/21000
992fcxxxxxx -      PLANET    18 DIRECT 4218     4199     195. xxxxxxxxxxxx/9993
a0efxxxxxx 1.4.6  LEAF      84 DIRECT 14227    14131    2a01: xxxxxxxxxxxx:f419:c04f:d6f/36869
fd3fxxxxxx 1.4.6  LEAF       2 DIRECT 4218     4216     2a01:xxxxxxxxxxxx:88c0:55f8/9993

@digixltd : The internet seems pretty stable and efficient (it’s a 1 Gbps connection) :

PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=121 time=9.56 ms
64 bytes from icmp_seq=2 ttl=121 time=9.32 ms
64 bytes from icmp_seq=3 ttl=121 time=9.89 ms
64 bytes from icmp_seq=4 ttl=121 time=9.63 ms
64 bytes from icmp_seq=5 ttl=121 time=9.75 ms
64 bytes from icmp_seq=6 ttl=121 time=9.15 ms
64 bytes from icmp_seq=7 ttl=121 time=9.92 ms

Still no idea ? It’s pretty annoying :confused:

sorry we missed this one. Is this happening when at least one of the nodes on the 192.168 physical network? I’m not sure exactly what it could be yet.

1 Like

I’ve only seen that on the 192.168 network indeed. This morning everything is pretty stable (less than 3ms ping reply). As soon as I see some slowdowns I’ll check with an external IP if there is any issue.

So… a few minutes after that. On my computer in 192.168 network (same as the main server that I’m testing) I encountered random packet loss (sometimes during 4 to 5 seconds) and then, after packet 115, network went down for 3 to 4 minutes.

On a mobile phone over a (very average) 3G network, My ping was constantly as expected (between 80 and 110 ms) with only 5 packets lost in 3 minutes. This is reasonable as this connection can’t be considered as a really stable connection and I had never two packet losts at once, just one, time to time.

When I link the same phone over the wifi network (which has the same subnet mask as the other computers), then I encounter many more packet lost (7,84%) and most of the time the lost packets are by groups (4 to 10 seconds timeouts).

We had same kind of stability issues when we tested out ZT over 4g, we couldn’t figure out what was the reason so we ended up changing to other vpn solution, but if you find solution I would also be interested to know what it was.

@jesse.jamsen If I have no solution I’m pretty interested to find out what you did use. Zerotier seems to be the only software fitting my needs (no access to router to manage ports redirection, computers on remote or local network, multiple OSes like linux, Mac, windows or even phones)

@p.bernardeau, in our case we went with Pritunl, It’s centralized ovpn solution but it solved our use case. (we have 3/4g enabled linux gateways and we need connection to devices in local network of those gateways) Requires way more setup than ZT since we need to run our own server and in case that this issue would be solved I would happily switch back.

So should I consider it as a bug and wait for a fix ? I can deal with it from now as I mainly work on a Mac and I can easy switch between hosts files with Gas Mask.

might be worth it to try 1.5 https://download.zerotier.com/PRERELEASES/1.5.0/dist/
(The Windows version isn’t signed yet, so you might want to wait if on Windows.)

Do I need to use it on every computer of my network ? else windows will wait, or I’ll wait to use it in the network (not important yet)

No, it’s backwards compatible.

Out of interest, I have seen similar when using /16 subnets in the managed routes section, we have sites where there are multiple /24 networks so we planed to summarise them in to a /16 per site but this caused intermittant connectivity as you described.

I’ve just tested by updating two of my peers in 1.5.0 and I still have the issue between those two.

I have the exactly the same problem. I have about 30 peers (Raspberry PI with DietPi OS) on the same network, they are all talking to 1 peer member running Ubuntu Server 18.04. I also have my Windows laptop on the network to SSH peers.

Sometimes I can’t SSH a peer but I see it Online on the controller. I have to unauthorize/reauthorize it, sometimes more than once, to finally be able to connect.

Sometimes I can SSH to a peer, but the later can’t ping my Ubuntu Server, it’s intermittent.

I tried to configure a moon, doesn’t help.

1 Like

In my case, those peers are shown as “RELAY”. They are connected to a Cellular modem

This topic was automatically closed after 14 days. New replies are no longer allowed.