I get disconnected from zerotier everday around 23.00 and get REQUESTING_CONFIGURATION error

I experience disconnections from ZeroTier consistently around 23:00 each day, accompanied by a “REQUESTING_CONFIGURATION” error. It typically reconnects on its own after a couple of hours, or I have to manually quit ZeroTier and restart it from services to regain connection. Does anyone have any suggestions or ideas to resolve this issue?

Windows server 2022

Any chance that your ISP does some kind of force reset of connections and DHCP renewal on the router?

I don’t think so. We have two servers connected to the same router. The server with Windows Server 2012 has never had any issues. Only the server with Windows Server 2022 is experiencing this problem.

There’s certainly nothing ZT is doing on its own that would cause that. It’s pretty rare that it would flip the network status from OK to REQUESTING_CONFIGURATION on its own. Do you have a script or automated task restarting the ZeroTier service every day at that time by any chance? Perhaps some sort of overzealous IDS/firewall on the network that decides it doesn’t like all of the encrypted UDP packets? Those are the only things I can think of off the top of my head.

I’ve identified the issue, but I’m unsure how to resolve it. Whenever Veeam Backup initiates on the backup VM to retrieve an image from the main VM, ZeroTier connectivity ceases for couple of hours. Despite several manual attempts, ZeroTier consistently halts when the backup process begins. I’m unsure if this issue is related to the ESXi host. Does anyone have suggestions on how to prevent the backup VM from disrupting ZeroTier connectivity on the main VM?

1 Like

I’m not familiar enough with Veeam or Windows Server to even speculate on what the cause could be. Have you tried restarting the ZeroTier service after it gets into this state? That may help give things a kick.

Ah - Veeam is taking a snapshot of the VM in order to do the backup from the locked vmdk file. This process sometimes uses what it called a “stun” to the VM so that it can quiesce IO to the storage to ensure there’s no outstanding IO in-flight when it commits the snapshot. This basically tells the VM to stop working briefly - and “briefly” being determined by the size, IO activity and back-end storage performance. This is probably interrupting the Zerotier service long enough to provoke the disconnection. Then you will get a similar thing happening when the backup is finished and the snapshot is deleted and its contents committed to the master vmdk file.

I can’t speak as to why the connection doesn’t get reestablished immediately afterwards though.

The usual solution to IO sensitive VMs is to back it up using something inside the OS (Veeam Agent) so there’s no snapshot operation at the hypervisor level or to ensure that it’s on really fast storage. Ref: KB1681: VM Loses Connection During Snapshot Removal for some troubleshooting tips.

It’s not a Veeam thing, it’s a snapshot thing. (I’ve seen this with other backup tools that use snapshots)

2 Likes

Thanks for that tip! Can a “stun” take longer than 30 seconds?

If you restart the zerotier system service after a backup, you’ll probably be ok. Maybe you can script it.

PS C:\Windows\system32> Stop-Service -Name ZeroTierOneService
PS C:\Windows\system32> Start-Service -Name ZeroTierOneService

Stun effects can last a long time if the VM has lots of active write I/O and the backing storage is slow but under normal conditions, it should be practically invisible and result in the loss of a couple of pings.

But Veeam and most other backup products include affordances for running scripts before and after the snapshot operations (historically designed to do things like manually shut down databases and flush outstanding writes to disk). Ref: Pre-Freeze and Post-Thaw Scripts - User Guide for VMware vSphere

I’d try using your service stop/start commands in these scripts and see if that makes things more reliable.

2 Likes

By the way, I think this will possibly be fixed in the next version of zerotier-one which is coming out soon.

Thanks for the detailed explanation, Erik. My issue is resolved, but I’m unsure exactly how.

Here’s what I did:

  1. I changed the backup time to 3:00 AM to minimize disruption during business hours.
  2. I discovered the ESXi server had an incorrect time setting. This caused the backup process to inadvertently change the main VM’s time to the next day (no idea why and how). I corrected the esxi server time.
  3. I installed ZeroTier to VM backup server and deleted afterwards ( Dont’t ask why. I was desperate)
  4. I restarted the backup VM.

While the solutions seems like a unrelated steps, restarting the backup VM might have been the key action that triggered the resolution. I am writing this so it might help someone else. Always start with restart i guess :slight_smile:

1 Like

This is almost definitely the source of your issue. Under the hood, Certificates of Memberships to ZeroTier networks are all timestamped. If your system clock is way off, this WILL cause problems!

1 Like