I’m using Zerotier for a network between my MacBook Pro and NAS. However I’m experiencing connection failure about every 30 minutes.
My MacBook is equipped with M1 Pro chip and running ZT 1.10.2, connecting to Internet via WiFi. My NAS is running ZT 1.8.4 in a docker image, connecting to Internet via 4G mobile network. The link between the two devices are not direct (which is relay), data is transferred by a moon set up on a cloud server which has direct p2p link between these two devices.
I found this failure because my file sync service was always aborted at approximately the same point. I made several attempt and found that the connection failed about every half an hour. What I did was I kept pinging the NAS from my laptop every 2 seconds. The connection issues started at about the No. 972 and 2032 package, and then restored in half a minute to about ten minutes (happened in other attempts). I attached the screenshot of my terminal that recorded two failure and restore for more info.
What’s more, I did the same attempt on another Windows PC running ZT 1.8.4 and using the same WiFi with my MacBook, but connecting to the same NAS via another ZT network and got the same result. The connection lost about every 30 minutes. The weird thing was, the Mac and Windows were not sharing the same failure time period, which means when I got timeout on Mac, the Windows was getting reply as normal and vice versa.
This totally confused me. I assumed that it was 4G connection on the NAS side, but this could not explain why my Mac and Windows got independent failure period. Now I thought may be this might have something to do with ZT controller, or some rules on port using (like time limit) set by my 4G carrier.
Can someone help me out? I really appreciate that.
Here’s the attachment. Sorry I can only upload one picture, so I have to put everything in one screenshot. There are titles for every section for better understanding.
I ran some further test and found that it may had nothing to do with 4G connection. It might be the moon server that caused the problem. I deorbited the moon on my laptop and tried the ping again. The connection was no longer failed every 30 minutes but the delay increased.
Now my moon server is a docker image set on a cloud server with a public IP address. After orbit it on both sides, the delay is remarkably lowered so I assume it is working. But why it will break the connection every 30 minutes? Is there a configration that set this time limit?
You are still seeing the 30 minute issue with the new moon?
What kind of router is each node behind?
Yes, the issue still exists. If I ping another device with the moon orbited on both sides, the connection will lose about every 30 minitus and then restore. Without the moon, I randomly get timeout errors, which appears to be normal.
After focusing on the moon, I tried to solve the issue. The moon service was running in docker. I used the image called “seedgou/zerotier-moon”. I upgraded the software on the cloud server, updated the docker image and rebuilt the container. And I tried to open all the UDP ports on the server in firewall settings. None of above worked.
My laptop was behind a WiFi provided by local ISP, the moon server had a public IP address, and my NAS was connected to Internet via 4G connection and a router. There were more than one layer of NAT on the laptop and NAS side, preventing devices from building up a direct link. But the moon did appear on my devices peer list and showed direct.
One more thing. Actually I deployed two zerotier services on the moon server. One for moon service and the other for normal zerotier virtual network service, using different contianers. The moon service was using port 9994 and the normal zerotier service was running on default port 9993, trying to make both services independent. But in my laptop’s peer list, the moon showed direct and the other device ID used by virtual network service showed relay. If I ping the virtual network’s ZT address (with moon orbited on both sides), I still got the 30 minute issue. Did the issue have something to do with two ZT services on one mechine?
Thus, I only opened UDP port 9993 and 9994, and some other certain port for other services on the server, the default policy on other UDP ports were set to reject. Is this setting that caused the issue?
Thank you for your help. I really appreciate it.
This is likely related to the double NAT and I’m not sure if there is a way to make it perfect.
For your moons, you may need to allow more in the firewall. Not all of your peers will be on 9993. Maybe you’re already allowing all outgoing udp?
-A OUTPUT -m owner --uid-owner zerotier-one -j ACCEPT will allow all outgoing traffic for the zerotier process.
Thank you for your reply!
I dind’t set any limitation to outgoing UDP on moon server. I guess it’s not the reason.
I didn’t stop testing after I posted this help. There were serveral factors that may affect the result, like unstable 4G network, unreliable wireless connection, which made the test very difficult to apply. However, I did find it may have something to do with the docker image I was using. I changed the docker image and saw a remarkable improvement on connection quality. I’ll keep testing to see whether the problem is solved.
And I have another question about the ports. Since you mentioned it here, I’ll just ask it here instead of opening a new post. I’m holding a ZT moon with public IP address. According to the knowledge base and what you said in the reply, I understand that a 9993 UDP port is far from enough. The knowledge said ZT will use dynamic high numbered ports also. So my question is, is there a possible range for the port numbers? I think allowing all incoming UDP ports will be too dangerous. If there is a range for ports that may used by ZT, at least I can block some ports to improve server security.
On the public IP address, I think just incoming 9993 is OK.
When nodes are behind NAT, the NAT translates 9993 to random-high-number so you need to send to any port. Some nodes are on public IPs, but listening on other ports too.
Thank you for your help!
I consulted my cloud service provider. Now my cloud server does have a public static IP and connect to Internet through a NAT. However, this NAT does not perform any port remapping, which means, for example, if I visit TCP port 80 through the public IP address, the NAT will directly forward the request to the same port on the server, ensuring the IP address is not used by other servers. I think this equals to a direct link to Internet.
I didn’t set any limit to outgoing traffic on firewall, so the server was able to send to any port. But I used to only open incoming UDP port 9993, then sometimes I saw the server as RELAY in other devices’ peer list. When I open all incoming UDP ports, I saw the sever sometimes on port number like 46567 instead of 9993.
So, if a server is directly linked to Internet and has ZT installed, will it listen on ports other than 9993? If so, is there a approximate range for the ports? Opening all UDP incoming ports seems to be a little bit dangerous for a server on Internet.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.