Connection problems with Zerotier on moving cellular(4G/LTE/3G) client

Situation sketch:
So we have the following situation, 1 node in the ZT network is fixed (either on wifi in a building, or a static 4G wifi router outdoors), and the other node in question is moving around at higher speeds, such as a car on the highway.

The problem
The connection between these two nodes can be very good (50ms ping or so), but there’s also moments/periods/areas (unsure what is causing this) where it grows to 15 seconds, or dropping out completely for minutes (while the node itself still has a proper 4G or 3G connection it seems).

The question
Does anyone have experience with moving devices connected over cellular (4G/3G) and connection problems?

The hypothesis
When moving between cellular towers, something happens in the handover (double appearance before timing out old connection, different IPs, etc) where ZeroTier might not allow the same client(ID based?) twice/using a new IP within X time? Or it keeps sending data to the first IP, while the device itself has moved out of that tower range and it’s IP? Or maybe it’s flipping back and forth between two cell towers creating a problem for ZT.

We have noticed this too but instead of just on LTE, when clients move from ethernet to Wifi, zerotier can take upwards of 5 minutes to reconnect.

I have a couple theories,

  1. limiting traffic between nodes and only allow traffic to the servers the clients need
  2. lowering the Multicast recipient limit to something smaller than 60 (I am currently trying 8)

Not sure about the original poster’s issue. It depends on the carrier and region. This would be very hard to reproduce and try to improve. Is the phone getting new IP addresses frequently?

We have noticed this too but instead of just on LTE, when clients move from ethernet to Wifi, zerotier can take upwards of 5 minutes to reconnect.

This is a known issue. Not sure when it’ll get improved.

Try having your other device, the stationary one, ping the mobile device.

Hi @Titus,

This might not be valid for all carriers/countries, but in my experience (in Europe), a mobile device should not switch IP when it switches towers. (At least not as long as it stays connected).

The ping test you are referring to, is that measured through the Zerotier network or between the endpoints on their public IP addresses?

Are you on IPv4/6?
Just a note: be aware that carriers tend to use CGNAT which might impact ping on the public IP addresses.

I am testing a script that pings a known ZT address and if that address doesn’t respond for 15 seconds, it’ll restart the ZeroTier service. This can cut down reconnect times by up to 80% (ymmv)

Thank you all for your replies so far. I’m looking forward to getting to the bottom of this!

Just to clarify, it is not 100% sure yet that the cellular connection stays active properly, just that it seems that traffic routing through ZT is dropping, whereas data going outside of ZT seems to stay alive. Also modem statistics keep showing a proper connection type and strenght, hence the suspection of ZT that I am trying to confirm/rule out.

So what I want to do to achieve this, is have the client (not a phone, but Linux computer), ping a zerotier address on a stable wired network, as well as some fixed ip (for example googles DNS server 8.8.8.8). At the same time I want to have another ZT node on a stable (different) wired network to also do the same. If the zerotier ping from the vehicle is getting high pings, while the Google DNS server pings are remaining low, it would point to ZT. If the DNS server pings are also high, it is more likely something in the connection itself.

@zt-travis

Not sure about the original poster’s issue. It depends on the carrier and region. This would be very hard to reproduce and try to improve. Is the phone getting new IP addresses frequently?

So far we’ve observed it regularly, so I want to do more tests. I will also try to figure out if the vehicle gets new ip addresses or not. That’s a good one, since it was just a hypothesis so far.

Try having your other device, the stationary one, ping the mobile device.

This is exactly what I did, a fixed 3rd party computer on wired network, pinging both the static 4G device as the moving 4G/3G device. Here is where I observed the low ping for static, and very high-dropping ping for the moving one (~80km/h)

@timmmy

This might not be valid for all carriers/countries, but in my experience (in Europe), a mobile device should not switch IP when it switches towers. (At least not as long as it stays connected).

The current situation is in The Netherlands, but it has to work worldwide. I will however try to confirm if it changes IP or not.

The pings I mentioned are through the ZT network, as that is how the traffic that is dropping out is routed. As described above I want to also ping a fixed public IP address (8.8.8.8) from the moving vehicle, to compare.

Are you on IPv4/6?

“Physical IP” in my.zerotier shows a IPv4, but that I will also check and confirm.

Just a note: be aware that carriers tend to use CGNAT which might impact ping on the public IP addresses.

Yes, I’m aware that a ping might not be the most representative test, if you have another idea on how to diagnose this please let me know :+1:

@HorizonsCT

I am testing a script that pings a known ZT address and if that address doesn’t respond for 15 seconds, it’ll restart the ZeroTier service. This can cut down reconnect times by up to 80% (ymmv)

Sounds like a workaround to try, but not really workable. To be honest, for our application 15s is already quite long. Obviously the timeout can be lowered, but it would be quite a hack to restart the service so often.

@Titus

I’ll do some testing on a 30 minute drive today with my LTE laptop. I am almost positive my public IP doesn’t change. Zerotier is VERY VERY, did I say VERY sensitive to bad network conditions. OpenVPN, and wireguard didn’t seem to care too much and stayed connected. I noticed this while at someones house and their ISP was experiencing 30% packet loss. Zerotier doesn’t work on that kind of connection but every other service/device on that network was effectively blind to this. (yes they were all experiencing that packet loss)

Here in the states, having ATT on my laptop, my IP doesn’t change unless my LTE card is off for like 35 minutes. But driving around I think Zerotier doesn’t alright.

I’m going to take a 30 minute ride today and run a slow iPerf test to on my zerotier network over LTE.

@Titus:

Regarding the CGNAT topic: If you initiate the ping from your device on 4G as you plan to do, you should be fine. When you would ping from another device, you might measure only up to the NAT gateway.

I would not trust signal quality indicators as an indication for connectivity. It’s sometimes lagging and ignore small dips and I’ve seen connectivity completely failing while range was perfect.

In case you’re also getting bad results on the 8.8.8.8 ping, what might help is to fix the router on one network technology (3G only or 4G only). Switching technology when connectivity dips might make things worse. Some chips also do some weird stuff when they change technology.

I just tested a ZT client connection behind a NAT:in gateway with its WAN link via LTE (with private APN)
The node connects OK (not in relay mode). The other nodes are all on a 100Mbit or better fiber with a NAT:ing Gateway.
When ping the mobile ZT node the latency is ap. 250-300ms with some 2 - 20s dropouts.
If I at the same time try to establish a ssh connection the ping session get a “very long” dropout and ssh client stalls. Some wireshark sniffing at station side reveals that some serious packet fragmentation is going on…
ZT interfaces gets a default MTU set to 2800 (where common support is normally 1500 to be safe)
If you’r node is on a fiber or any newer dsl link most of the infrastructure are likely to support MTU of 1500 AND “jumbo frames” up to 9000 (that imply that 2800 still is safe).
When you are on a Mobile Network (LTE or 3G) several parts in the infrastructure chain do NOT support “jumbo frames” and actively blocking it. This means 1500 including header is the theoretical maximal size. In reality there is also at least one VPN header overhead that you need to take off to get a usable MTU. Also in the “radio side” at the cells there are some bad behavior on larger packets (mostly on UDP ). My conclusions after some 15+ years of using IP networking on 2G -4G…
DON’T USE MTU LARGER THAN 1280, otherwise it will fragment as hell and all bad things that follow.

Now to the ZT and MTU settings:
You can NOT use ifconfig or IP command to set the node’s MTU in a persistent way, it will be overwritten by conf from ZT central, but it will work for some minutes or so…

I have NOT found any manual or instructions how to set a local MTU in local config or for a speciffic node at ZT central.

I have NOT found in ZT central UI any way to enter a “global” MTU setting.

I did find a solution using curl to post a “global” MTU setting (most likely related to address assignment)
but it also needs you to generate a API token… Not that user friendly !!
Still that would decrease the overall performance…

Anyone that successfully managed to modify MTU settings??

Best regards,
Anders

Hello,
tans for writing that up.
That 2800 MTU is for the virtual networks…

anyways if it helps you, and you want to use a different mtu

export TOKEN=your-api-token
export NETWORK_ID=your-networkid
curl -X POST "https://my.zerotier.com/api/network/${NETWORK_ID}" -H "Authorization: bearer ${TOKEN}" -d '{"config": {"mtu": 1280}}'

we’ve gone back and forth on including that in the UI, but 99% of the time it’s not a useful setting to change.

You’ll have to leave/rejoin the network to get the new MTU, if I remember correctly. Or possibly restart the zerotier service.

Yes you need to create an API token on the account page to do this.

These commands will work from a mac or linux terminal. I don’t know the equivalent Windows commands.