I tested some situations. To enjoy the performance improvement of multithreading, both ends must be Linux clients and multithreading must be enabled.
pve virtual machine, two zerotier clients, debian12, x64.
Client server does not enable multithreading: 1.5G
Client server only enables one multithreading support: 1.5G
Client server both enable multithreading: 2.9-3G, reaching the 2x performance improvement claimed by the official blog.
This test environment is a server environment, the CPU frequency is not high, so the absolute performance is not very good.
My confusion is that if the server opens another iperf3 instance and adds the client2 → server test, the bandwidth of the two sets of tests is still only 3G.
I am not sure whether this result meets expectations. Maybe the bottleneck now is that the zerotier virtual network card still has only one rx tx queue? In any case, no further research was done.
At the same time, the tap network card of Linux has long supported xdp. Therefore, the xdp program can be used directly to send and receive data on the zerotier network.
Simple test results: af_xdp receives TCP packets and forwards them to the iperf3 server of another machine. Of course, zerotier’s 3G bandwidth is fully utilized.
Zerotier’s tap nic has only one set of rx/tx. Therefore, with the improvement of zerotier’s multi-threaded performance in the future, there may be performance bottlenecks here.
A similar ref: Performance - ARM. eBPF and AES Offload