Windows node stopped working

I have a certain network. It has 3 Windows nodes on it. Everything was working fine on all 3 nodes yesterday.

Today Node C can no longer send or receive any ZT traffic. A & B can pass traffic between them. C will talk to neither one.

I tried disable/enable on the ZT adapter in Windows on C. Tried a full reboot of C. I tried disabling the firewall on C.

Nodes B & C are literally sitting beside each other on the same LAN. B works, C doesn’t. Yesterday all 3 worked. The ZTGUI app shows network status ‘OK’ on C. The ZT central status for all 3 is “online”.

Node C is running Win 10 Home x64. A&B are running Win 10 x64.

Any suggestions?

I have encountered a similar scenario as yours. I tried everything i could think of. Even created a thread regarding the issue.
What i found that worked for me was to move the devices to a new Network. That seemed to make all devices begin communicating again. P.S this is a temp solution, would like to see what the devs think about this… Its a problem that we need to sort out ASAP

Here my Post (Still waiting for anyone to give me suggestions :wink: )

How long was C up and running for?
Did any Windows updates happen?
What kind of router is C behind? Can it be rebooted without disrupting too much?

ZT has been installed on it for about 4 days. The box had probably been rebooted within the last 24-48 hrs.

Windows updates are always happening; I dunno about any specifics at this time - but not aware of any.

C is behind an AT&T fiber optic business VOIP router. Can’t reboot it without much pain to the users. Note that B is on this same LAN behind this same router and working fine.

Following up … about 3 or so hours later it all started working. Reason unknown.

Note this isn’t the first time I’ve had this sort of “outage” with ZT. But I’ve decided I must find a solution of some sort. Really need it to Just Work.

Well, Node C is offline again. Don’t know when it started.

Have you tried moving to a new Network?.. worked for me and has not dropped now for about 9 days.

What exactly do you mean “move it to a new network”?

Do I delete the existing network and re-create it and re-add all the nodes back into the new network?

You don’t have to delete it. You can even join two at once.


Can you open an admin powershell and do
zerotier-cli peers
and
zerotier-cli listnetworks
and
zerotier-cli info
and paste the output here, or direct message it to me.


the way it’s flapping makes me suspect something about the router. (I understand that another node is running fine behind it). Is there a way to look into the NAT settings on it? I’m not familiar with that model.

Does the internet connection ever drop out there? Or that PCs connection?
There is a windows user that has to stop zerotier for 10 minutes to get his connection to come back https://github.com/zerotier/ZeroTierOne/issues/1214

Sorry for all the questions. We haven’t been able to reproduce these and would like to get them fixed.

Thanks. Will do. It’s been working this afternoon. Soon as I catch it failing again I’ll send you the output.

Hi
Yes you can join a second network. However for me I deleted my old network and moved the nodes there. Been stable since.

Can someone direct me how to direct message @zt-travis ?

Even on the messages page I don’t see any way to actually create a message. And when I go to @zt-travis profile page I don’t see any way send a message.

Apologies, I’m probably just being dense.

Thanks for that output.

It is strange that the connection to the controller is the only relayed one:
0cccb752f7 1.4.8 LEAF -1 RELAY

hmm…

@zt-travis For the other questions you asked…

That AT&T router is pretty much a black box to us. It is a managed service. I have no idea how the NAT is configured. FWIW, there are other “cloud” services that seem to be working fine (e.g. Teamviewer).

The internet is a very reliable 20Mx20M fiber service. Pings are a very consistent 15ms to google.com. In the few months I’ve been doing support there I’ve never seen the Internet or the PC connection on that PC waver. It’s a pretty solid setup.

Another data point… On ZeroTier Central, both the nodes on that LAN/router show ‘Physical IP’ as ‘Unknown’. And it has been that way always. But when I’m on that network I can go to any of a dozen “what is my IP sites” and it will give the correct physical address. Something weird there.

Yet another data point … almost forgot about this … on that same LAN router I have a little Raspberry Pi running Ubuntu Server 20.04 with ZT. AFAICT the ZT on it has never failed. When I look at it in ZeroTier Central, the ‘Physical IP’ does not show ‘Unknown’, but instead it shows an IP addr that is totally wrong. And that wrong IP address does not appear to even be an AT&T assigned IP.

Probably one of our relay servers. We should change the UI a little so it doesn’t print those…
Post the address if you happen to come across it.

BTW, this evening “Node B” on that same LAN/router seems to have also stopped working for the last few hours.

I have some other nodes that have now stopped working. They are on different networks and different types of routers. Some of them are running Linux (Ubuntu).

I’ve been watching this quite a bit over the last few days. Here’s my very imprecise observation…

I have about a dozen nodes that I connect to fairly frequently. At any given time I can expect 1/4 to 1/3 of them to be unreachable. When they go offline it varies from hours to as much as 3 days.

I tried creating another network as some suggested. The failed nodes weren’t reachable on this new network either. In some cases they became reachable on the old network but the new one was still out.

They’re on a variety of routers and Internet services. They are a mix of Linux and Windows. If there’s a pattern I can’t discern it.

Can anyone offer any hope that this problem is fixable? Thanks.