Packet Loss w/ High Packets per Second (PPS)

We introduced ZT in our mixed infrastructure consisting of dozens of physical servers and VMs. VMs are sharing a Linux bridge with the physical servers.

Unfortunately we notice packet loss in situations where we have high pkts/s rates (>100k pps.)
All peers are directly reachable (no relaying happening). Servers are using atmost 32% of their physical line rates (measured with ifpps).

After analysis with dropwatch it seems that the drops are happening on the zerotier process using udp_queue_rcv_one and nowhere in the network stack which seems that the zerotier process is too slow to catch up with the amount of incoming packets. Tuning on txqueuelen and udp receive buffers didn’t help so far. We also tried to introduce a trusted path with a dedicated ethernet link between two servers without any notable improvements.

Is there anything we can tune or improve?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.