Suspected DNS poisoning?

I will try to explain this the best I can. I have ZT installed on about 22 windows machines. I have a bridge installed in my datacenter running on ubuntu. This was to be used to bridge my DC subnet so the PCs have access to my servers.

On the ZT virtual NIC I also set static IPs for my DNS server at the DC. This was so that they would resolve and be able to talk to the domain.

Now for the interesting part. I have a /22 of public space that has a bunch of servers in the same datacenter, that all these pcs need to connect to. I prefer they connect through the WAN as it is a much faster connection. I never allowed this public network through the bridge via routes.

However, Since I pushed ZT to my clients they have randomly lost access to the servers on the public network. I can add a route and allow global access and it works, just much slower.

When the PCs loose access to the public network it last about 12 hours before they can connect again. It is random when it happens and it doesn’t happen to all of them at the same time. It seems like things might be being poisoned with asymmetric routing because of the bridge or something.

Bridge Networks: 192.168.xxx.0/24

Any Ideas? I am pulling my hair out and about to abandon Zt altogether.

I’ll preface by saying I have little to no experience in the land of WIndows DCs, but I’m going to throw out a couple of ideas.

Is your ZeroTier network address range in any way overlapping with any other networks you’re trying to route to? That could definitely cause issues?

Are you trying to route over ZeroTier to public IP address spaces? If so, “Allow Global” must be enabled on all client endpoints in order to route to those addresses.

Since you’re connecting to your DC over ZeroTier, is Windows putting the ZeroTier IP addresses into the DNS server? That could cause communication issues for machines that aren’t on the ZeroTier network for sure.

You mentioned setting the DNS IP addresses on the ZeroTier adapter. ZeroTier doesn’t support this and may overwrite or remove those settings. You could look into rolling out a Group Policy update using the NRPT rules. I don’t know if/how this works with connecting to a DC however.

A DC is just a combination of DNS/LDAP. When a Windows client wants to authenticate, it looks up which LDAP server is available using DNS. By default and without using NBT, a Windows client primary dns must point to a DNS server for the domain or through a forwarder of authority (SOA). Search order, etc can be changed using domain policies.

When the client refuses to talk to the DC, check what is happening using DCDiag. There are a bunch of other Microsoft tools that can be used to troubleshoot this.

It never stops communicating with the DC. That is not the issue. The issue is it stops talking to the servers that are on the public space. Those servers reside in my DC but not in my domain. However, My firewall handles traffic for both.

In looking at your network configuration, there’s at least one, and possibly 2 issue.

  1. You have a route specified to what I’m assuming your public /22 address space via a node on your ZeroTier network. That means the nodes running ZeroTier must have Allow Global set for the network. ZeroTier will not route to non-private IP addresses by default. The Allow Global flag will allow that. This will likely get you half way there.

  2. Machines in your public /22 need a route back to your ZeroTier network in order to reply. Setting Allow Global allows the packets to get from your ZeroTier network to the public address, but if those machines have no route back to the ZeroTier network, they’ll just be firing the replies off into the ether.

If your requirement is for all traffic to those public IPs to go through your ZeroTier network, you’ll have to have to complete 1 & 2 above. If you don’t need the traffic to your /22 to go over ZeroTier, remove the route from your ZeroTier network and it will go through the normal internet routes to those addresses.

When I route the public space through ZT it works just fine. The problem with that is it takes my latency from low double digits to high triple digits.

What do you mean by lose access? What troubleshooting have you tried?
Is the client failing to reach the server by name and/or by IP?
Where does a traceroute go?

It is random. When I say it looses access I mean that the client can’t see the servers. Sometimes the fqdn will resolve and other times it won’t.

When it does resolve a tracert will take it as far as my bgp peer with the ISP and then die. This is why I think it could be DNS poisoning, but I am not smart enough to figure out what/where it may be getting flagged and blocked.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.