Strange (MTU-Related?) packet loss, Win7 <-> Win Server 2019


Just set up a small (10-ish node) network which allows people from a certain company to reach a server that has no inbound public access. The server sits at a static IP on its ZT network and is accessed from client systems also joined to that network (with dynamically allocated addresses). The server runs Windows 2019 Server with Microsoft Remote Desktop services running (including RDWeb and Remote Applications). Most users are having no problem with this configuration, but one guy frequently cannot access the server even though he’s joined to the network. One of the odd things about this is that sometimes it work fine. But other times, it does not.

In case you’re not familiar with Remote Desktop in all its glory, the “RDWeb” environment in use here means that users first go to an https web address (hosted on the server). There they log in with appropriate credentials and are then presented with a page of application icons. They click an icon of their choice, and the app is “remoted” to their screen through some smoke and mirrors occurring at the display driver level.

So this all works for the most part (as it should – this technology is fundamentally 20 years old), but for this one guy, sometimes when he brings up the web page, the page does not appear to load properly, and eventually the browser times out and gives him a “cannot reach” error. Yet the server is up just fine, and nobody else has problems.

I did some troubleshooting on this and I believe the problem may have something to do with packet size. If I ping the server (through its VPN address) from the client with the default packet size, everything goes fine, but if I ping with specific packet sizes, ping responses stop somewhere between 1300 byte pings (those work) and 1310 byte pings (those do not). This is regardless, by the way, of whether I specify the “do not fragment” option. It seems that packets over 1300-ish bytes will not traverse the ZT tunnel for this guy.

As an experiment, I used windows command line tools to set the MTU for the ZT network “adapter” to be 1300, and once this is done, everything works great. But I assume this setting will not survive reboots (or possibly even turning the ZT connection off and on again), so this isn’t a good long term solution.

Does this sound like anything anybody else has run across? The one thing I know that’s unusual about the client computer in general is that it’s running Windows 7 (yes, I know). Could it be as simple as that? I don’t see (here, in the KB, or via google searches) anything suggesting there are packet size issues between Windows 7 and ZT-connected systems, but maybe I’m the trailblazer here.

Thoughts, anyone? Thanks.

could just be a Windows 7 thing. What kind of internet connection is this device on?

It’s possible to change the MTU on the zerotier network, but it’s not exposed in the UI. It’s very rarely a solution. But if it works for you and you want:

curl -X POST "${NETWORK_ID}" -H "Authorization: bearer ${TOKEN}" -d '{"config": {"mtu": 1300}}'
You’ll have to leave and rejoin the network to get the mtu to change. You may have to restart the zerotier service.

Thanks, Travis. It does indeed look like a Windows 7 thing, as a second user from this group has now encountered the problem. Her symptoms were a bit different than the first guy, but in general both were experiencing problems that looked to be caused by incomplete transmission of network data. And her system was Windows 7 as well. I believe these two people are the only ones in this group that still have Windows 7 devices.

On both people’s computers, I was able to resolve the problem by setting the MTU (for the pseudo-adapter created by ZT) within Windows. If I set this as a persistent MTU change, Windows appears to continue to honor it across boots and (thus) across starts and stops of the zerotier service as well.

So I have a workaround which seems to be effective; I think it is probably preferable to setting the MTU for the entire network (I presume that’s what your API call does, since it’s passed the Network ID and not the Node ID). So, I’ll probably stick with my workaround. And I will be updating both of these computers to Windows 10 (or maybe 11) soon, so then I’ll know if that really was the real problem. Evidence is certainly pointing that way.

That said, this issue might be something somebody there might want to look at. Whether Microsoft likes it or not (or whether I do), there are still quite a few Windows 7 devices out there. If this is a problem for all Windows 7 systems, that would be … well, bad. :slight_smile:

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.