Synology Docker - routing table entries do not survive reboot

timmmy · September 3, 2021, 9:43pm

Hi all,

I’m planning to upgrade from DSM6 to DSM7 so I’m trying to get zerotier running in docker.
The instructions in Synology NAS | ZeroTier Documentation are great and thanks to them I’ve managed to get it up and running. However after a (host) reboot, it’s not recovering fully.

Tun driver is starting fine, Zerotier is starting and connecting to the network. Unfortunately the route entries are not restored. Just adding the route again fails (device unavailable error), which is confusing as the device is showing up in ifconfig just fine.

Only workaround I’ve found is leaving and joining the network, but even that is not always successful on the first attempt.

Has anyone else seen this?

Thanks in advance,

Timmmy

cleverit · September 7, 2021, 6:28pm

We are also seeing the same issue. Have you had any success with finding a solution?

timmmy · September 9, 2021, 7:18pm

I had to send a large backup off-site so I wasn’t able to run further test.
Last attempt was to add a script that runs on reboot as root with some sleep time to make sure zerotier can catch up. Something like:

sleep 2m
docker exec -it zt zerotier-cli leave <network>
sleep 1m
docker exec -it zt zerotier-cli join <network>

However I wasn’t able to get this to work 100% off the time, even when logged in to the terminal. Maybe you have more luck?

Looks like the docker container is configuring the network interfaces correctly when it’s joining a network, but not when the container is starting. I your host has been restarted since the network has been joined, this occurs. Could be a bug…

cleverit · September 9, 2021, 7:35pm

I had already tried that, but without the sleep. I’ll try now and report back. Support thinks it’s a routing issue, FWIW.

cleverit · September 9, 2021, 7:37pm

One thing I am noticing is that for 2 of my NAS’s, they are completely unresponsive via zerotier. But for the rest that are having issues after a reboot, they seem to be communicating enough that our main NAS is getting CMS updates from them over zerotier, but we can’t remote into them at all. Almost like it’s working one-way.

cleverit · September 9, 2021, 8:33pm

So, you’re right, this works sometimes and sometimes does not. But I found that by running it a second time, it did work on the two that didn’t work the first time around. I’m not sure how helpful that’s going to be. I’m going to reach back out to support with this information.

timmmy · September 10, 2021, 10:10pm

Hi @cleverit,

Thanks for the info!

Issues seem to increase here. While the NAS running the connection broke. I’m trying to recover, but everything seems to fail (rejoining network, rebooting NAS, rebooting container, redeploying container, …)

Ifconfig looks, route looks good, zerotier status is online, but even a ping with the interface specified gets nothing through. (And in this case there seems to be no connection at all, no matter the protocol)

@zt-joseph : Any thoughts? Would be much appreciated!

zt-joseph · September 10, 2021, 10:29pm

I’ve been monitoring this issue but don’t have a solution yet since I cannot replicate it. One question: everyone used the CLI as opposed to the docker package GUI to set this up, right? I’ve found that package to silently corrupt docker configurations.

cleverit · September 10, 2021, 10:47pm

Yes, CLI on all 20+ of my NAS devices. Followed the instructions exactly on each one.

timmmy · September 10, 2021, 10:48pm

Hi @zt-joseph,

Thanks for following along!

Over the CLI indeed.
Maybe an import nuance, I’m still on DSM6. @Cleverit: How about you?

Just managed to get it running again after shutting down all other docker containers and a leave/join.
Although I guess it’s rather a lucky shot.

I do have the impression that the route sometimes just vanishes from the routing table but just restoring the route gives an error or does not fix the issue

cleverit · September 10, 2021, 10:49pm

All but two are DSM6. I have one DSM7 that works without fail after reboot and one DSM7 that fails after reboot.

zt-joseph · September 10, 2021, 11:39pm

Hmm.

When ZT is in an unreachable state can you verify that the node ID is correct with docker exec -it zt zerotier-cli info? I wonder if in some cases the volume isn’t mounting properly preventing the daemon from finding its identity keys and thus generates new ones.
Are there any other VPN-type applications that might be interfering?
Is QuickConnect enabled?
Does this same issue happen when you docker start/stop the container?

twelife2014 · September 11, 2021, 3:03am

I solved it, first rebooted my NAS, then deleted the containers, images in turn, re-executed the installation documentation (Synology NAS | ZeroTier Documentation) and changed the IP in the Zerotier control panel, I can already link to access, but the network speed is very slow。

zt-joseph · September 11, 2021, 3:39am

@twelife2014 this seems like a different problem. If you create a separate ticket and describe what it wrong I can help you there. As for the performance, you should check zerotier-cli peers and make sure everything is DIRECT and not RELAY.

cleverit · September 11, 2021, 5:08am

So not sure what exactly you’re looking for here. All I can tell you is that the response matches status and looks fine.

So interestingly enough, I had a NAS lose connection after activating Virtual Machine Manager and I have not been able to get it to reconnect. The leave/join/leave/join method is not working here this time around. I’m now wondering if any kind of changes to the network interface is tied to the root cause and a reboot is just the most obvious change to the network interface.

cleverit · September 11, 2021, 5:34am

No other VPN-type applications.
In almost all cases QuickConnect is enabled, but for two of my NAS’ experiencing this issue, QuickConnect is not enabled and never has been.
I just tested and restarting the container does cause this issue to happen.

timmmy · September 11, 2021, 7:31am

Show as online also while having isseus
Volume mount works fine. I have mapped it to a host folder and I see the network files being added when joining (reverse when leaving). Furthermore join/leave works without needing to approve the host again in the console.
No other VPN type applications
QuickConnect disabled
Starting and stopping the container indeed also triggers it.

@cleverit: Good point regarding changes to the network interface. That’s indeed the common thing across al scenario’s.

Is there a way to have Zerotier re-run the network config without leaving/joining a network?

timmmy · September 11, 2021, 7:32am

Just forgot, interesting possible coincidence: I also have Virtual Machine manager running, although all VM’s are stopped. Network wise no special config there, all VM’s are bound to one of the physicall LAN ports.

cleverit · September 11, 2021, 2:25pm

Unrelated. I have devices with VMM never installed experiencing this issue.

timmmy · September 12, 2021, 9:09am

Issue occured again over night (no network config related changes, but of course a short interruption could have happened). Restarting the container and restarting the NAS did not fix the issue.

Checked network config a bit further:

Route was gone from routing table
ZT interface (I’ll refer to it below as “ztif”) was still listed in ifconfig, but if you try any command ex. “ifconfig ztif” fails with “error fetching interface information: Device not found”
Ran “docker exec -it zt ifconfig ztif up”
ifconfig ztif is also back up on the host. Route is still missing.
Added route manually
Everything is working again

So looks like something cause the interface to go down (or zt maybe takes it down because of connection issues) and it’s not coming back up.

Note: I ran the ifconfig up command after NAS reboot. Not sure if that would work without.