Synology Docker - routing table entries do not survive reboot

I’ve been monitoring this issue but don’t have a solution yet since I cannot replicate it. One question: everyone used the CLI as opposed to the docker package GUI to set this up, right? I’ve found that package to silently corrupt docker configurations.

Yes, CLI on all 20+ of my NAS devices. Followed the instructions exactly on each one.

Hi @zt-joseph,

Thanks for following along!

Over the CLI indeed.
Maybe an import nuance, I’m still on DSM6. @Cleverit: How about you?

Just managed to get it running again after shutting down all other docker containers and a leave/join.
Although I guess it’s rather a lucky shot.

I do have the impression that the route sometimes just vanishes from the routing table but just restoring the route gives an error or does not fix the issue :confused:

All but two are DSM6. I have one DSM7 that works without fail after reboot and one DSM7 that fails after reboot.

Hmm.

  • When ZT is in an unreachable state can you verify that the node ID is correct with docker exec -it zt zerotier-cli info? I wonder if in some cases the volume isn’t mounting properly preventing the daemon from finding its identity keys and thus generates new ones.

  • Are there any other VPN-type applications that might be interfering?

  • Is QuickConnect enabled?

  • Does this same issue happen when you docker start/stop the container?

I solved it, first rebooted my NAS, then deleted the containers, images in turn, re-executed the installation documentation (Synology NAS | ZeroTier Documentation) and changed the IP in the Zerotier control panel, I can already link to access, but the network speed is very slow。

@twelife2014 this seems like a different problem. If you create a separate ticket and describe what it wrong I can help you there. As for the performance, you should check zerotier-cli peers and make sure everything is DIRECT and not RELAY.

So not sure what exactly you’re looking for here. All I can tell you is that the response matches status and looks fine.
CleanShot 2021-09-10 at 23.00.16@2x

So interestingly enough, I had a NAS lose connection after activating Virtual Machine Manager and I have not been able to get it to reconnect. The leave/join/leave/join method is not working here this time around. I’m now wondering if any kind of changes to the network interface is tied to the root cause and a reboot is just the most obvious change to the network interface.

  1. No other VPN-type applications.
  2. In almost all cases QuickConnect is enabled, but for two of my NAS’ experiencing this issue, QuickConnect is not enabled and never has been.
  3. I just tested and restarting the container does cause this issue to happen.
  • Show as online also while having isseus
  • Volume mount works fine. I have mapped it to a host folder and I see the network files being added when joining (reverse when leaving). Furthermore join/leave works without needing to approve the host again in the console.
  • No other VPN type applications
  • QuickConnect disabled
  • Starting and stopping the container indeed also triggers it.

@cleverit: Good point regarding changes to the network interface. That’s indeed the common thing across al scenario’s.

Is there a way to have Zerotier re-run the network config without leaving/joining a network?

Just forgot, interesting possible coincidence: I also have Virtual Machine manager running, although all VM’s are stopped. Network wise no special config there, all VM’s are bound to one of the physicall LAN ports.

Unrelated. I have devices with VMM never installed experiencing this issue.

Issue occured again over night (no network config related changes, but of course a short interruption could have happened). Restarting the container and restarting the NAS did not fix the issue.

Checked network config a bit further:

  • Route was gone from routing table
  • ZT interface (I’ll refer to it below as “ztif”) was still listed in ifconfig, but if you try any command ex. “ifconfig ztif” fails with “error fetching interface information: Device not found”
  • Ran “docker exec -it zt ifconfig ztif up”
  • ifconfig ztif is also back up on the host. Route is still missing.
  • Added route manually
  • Everything is working again

So looks like something cause the interface to go down (or zt maybe takes it down because of connection issues) and it’s not coming back up.

Note: I ran the ifconfig up command after NAS reboot. Not sure if that would work without.

What command did you use to re-add the route and was it done within the docker container or on the NAS? I’m wondering if I can incorporate that into the script I’m running to have the script run every 5 minutes to see if it can reach the zt network and fix itself if it can’t.

Thanks for all of your troubleshooting efforts. However I still cannot replicate this issue on any of my units. What you’re saying certainly makes sense, a missing route and a down interface would cause these problems. Also, ZT doesn’t ever bring its interfaces down unless one leaves a network, so if it’s down, I strongly suspect something else is setting it to that state.

A little info on my main test setup:

  • DS216+II
  • DSM7
  • Docker version 20.10.3, build b455053
  • Did not use docker compose
  • Nothing else other than stock applications installed

I wonder if it’s related to this issue that Nebula also has on Synology docker. https://github.com/slackhq/nebula/issues/256

For now, I have a workaround in the form of a script that runs hourly and re-adds the route if it’s missing. I’m wondering though if the device ztwdjclgcv is static or could that change? Here is the script for anyone interested. We have all our client NAS’s connecting to the same zerotier network (with flow rules preventing communication between them), so this may not work for everyone but will have to be customized regardless. I did verify that this works again the 3 NAS’s that were currently unreachable.

EXIST=`ip route show 192.168.XXX.0/24 | wc -l`
if [ $EXIST -eq 0 ]
then
route add -net 192.168.XXX.0/24 dev ztwdjclgcv
fi

Looks like you already have the comment in your script meanwhile :wink:
Thanks for sharing it.
In my case I would need to bring the interface back up first (also added some sleep to make sure interface is up before adding the route).

EXIST=`ip route show 192.168.XXX.0/24 | wc -l`
if [ $EXIST -eq 0 ]
then
docker exec zt ifconfig ztwdjclgcv up
sleep 10
route add -net 192.168.XXX.0/24 dev ztwdjclgcv
fi

I’ll give it a go next time the connection drops.

@zt-joseph: Could be that the state is set by something else but that’s really hard to debug. And the fact that ZT almost doesn’t log at all does not really help to be honest :confused:
Do you have a docker image with debug enabled?

I’m just wondering how the interface is supposed to come backup after host or container restart. (I don’t have a lot of docker experience so might be a stupid question). On the host /etc/sysconfig/network-scripts/ifcfg-ztxxxxxxx just contains BOOTPROTO=static (nothing like ONBOOT=yes, no IP range or netmask,… )
If the zt container only configures the interface when joining or leaving and the host does not contain any config…

for debugging, try (sudo) ip monitor on your host, then start the zerotier container. I don’t know if this will work.

@cleverit : I’ve some testing today with your script (with the ifconfig up added). Works like a charm :slight_smile:

I’ve not seen any disconnects over the last week, so I’ve been unable to dig further into the source of the issues.

Regarding the instructions Synology NAS | ZeroTier Documentation, adding tun.sh as boot script seems to do nothing (tun is starting without just fine). I have removed it. Because it’s added this way it is also flagged by security advisor as a malicious boot script. Maybe it make sense to add it as a boot script using the UI taskscheduler to avoid this? (If it’s needed at all)

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.