Small MTU size causing kernel crash

We have identified one issue when using Zerotier connection via GSM modules where when we have a low MTU value it causes a kernel crash.

With the setting of the GSM (wwan0) interface with 1358 (which is the default value attributed to the GSM interface) and when packets with bigger size than MTU (that will be fragmented) are generated (sometimes a single ping -s 2800 ) within one docker container (docker container MTU 1500) the kernel crashes with some example logs below. This happens only when sending traffic (pakcet size > 1500 to cause fragmentation) over GSM via the VPN interface (zerotier). If we use zerotier over WI-FI (wlan0 interface) we do not experience any issues.

We have tested and reproduced the same error with different GSM hardware vendor modules and multiple computer modules to discard the issue being associated with the hardware. The zerotier MTU is standard 2800 and setting smaller sizes (like 1358) did not help.

The only “fix” to the issue is setting the GSM (wwan0) interface to a bigger MTU (for example 1500). With the bigger size we do not experience kernel crashes.

Could this be a bug on the linux kernel or on the zerotier (or associated with zerotier usage of the network tap interface)?

The system we did the tests consists on a raspberry pi CM3 with balena OS running on it.

Some more information of the tests we did:

  • GSM traffic with zerotier blocked works fine.
  • Crash when traffic over GSM and containers communicating with cloud via zerotier.
  • If no containers communicate via zerotier, and zerotier is on, the box keeps online.
  • Crash happens from host mode container also (and bridge containers)
  • Using latest release of zerotier (1.8.9) as well presents the issue
  • Replaced GSM module hardware with one from a different vendor and issue is still present
  • Issue is present in around 8 compute modules (to discard issue with one specific module)
  • Upgrading the OS (balena OS 2.72.0 and 2.95.8) still presented the issue

Logs of the crashes:

[ +0.000052] br-a678f29ddfc0: port 10(vethba3cece) entered disabled state
[Apr29 21:13] dwc_otg: DEVICE:005 : update_urb_state_xfer_comp:750:trimming xfer length
[ +0.032241] python3: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=docker-c128e732dc738a10d2a03a2ded16993b655cb2e88f2631b91337dbd61d89e6b4.scope,mems_allowed=0
[ +0.000034] CPU: 0 PID: 4191 Comm: python3 Tainted: P C O 5.4.83-v7 #1
[ +0.000005] Hardware name: BCM2835
[ +0.000005] Backtrace:
[ +0.000017] [<8010e1a4>] (dump_backtrace) from [<8010e518>] (show_stack+0x20/0x24)
[ +0.000008] r7:ffffffff r6:00000000 r5:600b0113 r4:816a2b8c
[ +0.000011] [<8010e4f8>] (show_stack) from [<80a1a024>] (dump_stack+0xd4/0x118)
[ +0.000012] [<80a19f50>] (dump_stack) from [<802eda88>] (warn_alloc+0xe0/0x174)
[ +0.000009] r10:00000000 r9:89936000 r8:00000000 r7:ffffe000 r6:80cf2928 r5:00000000
[ +0.000005] r4:00000000 r3:816077c8
[ +0.000011] [<802ed9a8>] (warn_alloc) from [<802eec70>] (__alloc_pages_nodemask+0x1154/0x1240)
[ +0.000005] r3:00000000 r2:80cf2928
[ +0.000007] r7:00000a20 r6:b9cd97e4 r5:00000201 r4:00000200
[ +0.000012] [<802edb1c>] (__alloc_pages_nodemask) from [<80836a58>] (skb_copy_ubufs+0xe0/0x540)
[ +0.000009] r10:b78ab000 r9:00000001 r8:b78ab058 r7:000033b4 r6:b9cd97e4 r5:00000a20
[ +0.000004] r4:00088dca
[ +0.000010] [<80836978>] (skb_copy_ubufs) from [<808505b0>] (__netif_receive_skb_core+0xb98/0xc58)
[ +0.000009] r10:b78ab000 r9:00000001 r8:b78ab058 r7:00000008 r6:b78ab040 r5:8160a938
[ +0.000005] r4:899ed0c0
[ +0.000008] [<8084fa18>] (__netif_receive_skb_core) from [<808506bc>] (__netif_receive_skb_one_core+0x4c/0x90)
[ +0.000009] r10:0000012c r9:00000001 r8:00000000 r7:00000040 r6:be5821c8 r5:00000000
[ +0.000004] r4:b78ab000

Or on other cases:

[Apr29 23:10] IOCTL failed: ec9d41ba id=0x10000, sub_id=0x10003 action=1, status_code=0x80000007
[Apr29 23:14] dwc_otg: DEVICE:005 : update_urb_state_xfer_comp:750:trimming xfer length
[ +0.154469] ksoftirqd/0: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0

Hi. This is most likely a kernel issue, and not ZeroTier. ZeroTier uses the Linux kernel provided tun/tap module (/dev/net/tun) and that’s it. There is no kernel level code in ZeroTier itself on Linux.

Given that you report that this only happens on your GSM interface, I’d wager to guess that the issue is somewhere in the driver for that interface. Beyond that, I cannot say.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.