Unable To Saturate 10Gb Link

I am testing the idea of using ZT as a Kubernetes backbone network to allow for geographically distributed control planes/workers with encrypted traffic between them. I have a three-node control plane geographically distributed in the cloud and two three-node worker clusters at two separate data centers. I am trying to get the two 3-node worker clusters to be able to fully saturate the 10Gb links between them. I have enabled trusted paths between the nodes at each data center but I’m only seeing around 2Gbps throughput between nodes over the ZT interface. Testing the physical interface I get a throughput of just under 10Gb. The links are utilizing dual 10Gb LACP connections so it’s technically a 20Gbps link but due to TCP streams only being able to utilize one connection at a time, I only expect to see 10Gbps.

Any insight as to why I’m only seeing the 2Gbps would be greatly appreciated.

2Gbps is actually not bad, at least with the current version. One of the major bottlenecks there is that the main I/O loop is single threaded. We are working on this in our next-gen code base. The major bottleneck is encryption though, and that will always add overhead.

Do you actually need that kind of speed if it’s geographically distributed? I’m not sure geographically distributed traffic is going to be stable at that speed unless you are running over someone’s private backplane. The global Internet can do that, but you will occasionally encounter glitches in latency or packet loss as BGP flips over etc.

The single-threaded process bottleneck makes sense, but it seems like an 80% decrease in throughput is a lot. The encryption would also make sense but I’m (in theory) utilizing trusted paths within each data center network to try to eliminate that bottleneck. Is there a way to confirm that the trusted paths are working how they’re supposed to?

I only need the speed within each datacenter because we are distributing terabytes of data across nodes and I would like that to be able to happen as fast as possible without disrupting other workloads in the cluster.

You probably shouldn’t rely on trusted paths. They’re deprecated and will be removed in future versions.

Ah, well that’s good to know. If that’s the case and any method of disabling encryption on specified networks/interfaces is going away, then I will probably need to redesign my architecture to move cluster communication (or at very least storage traffic) off of the ZT interfaces. Thanks for the info.

I should say: trusted paths will be removed in the same future version(s) when better I/O threading is introduced.

As far as overhead goes: bare metal networking is simple. You just write to memory and then the network card DMA’s that and spews it out over the wire. As soon as you introduce any encryption or encapsulation you are now running multiple passes of actual code over the data before you send it. That’s always going to add overhead. Minimum overhead can be calculated based on your CPU’s clock and RAM speeds but usually encryption is at least a few cycles per byte so it’s always going to be worse than that.

REALLY fast encrypted links such as Google’s encrypted fiber optic links between data centers make use of custom ASIC chips to do encryption at wire speed. The protocol of course is bare bones and custom as well.

1 Like

2gbps is not bad for a software network overlay but could be better with multithreading and a bit more optimization.

I’ve been doing some reading up on version 2.0 and wondering if there is a timeframe of when the new version will be available? Is there a beta version that I can download and test on our infrastructure?