Linux kernel crash report

Source: syzbot/LKML email report, HEAD commit 6596a02b2078 (Merge tag ‘drm-next-2026-04-22’), upstream git tree.

Oops-Analysis: http://oops.fenrus.org/reports/email/69e87e0e.a00a0220.9259.001c.GAE_google.com/report.html


Key elements

Field Value Implication
UNAME syzkaller #0 Upstream kernel, exact commit 6596a02b2078
PROCESS syz.6.3284 (PID 17707) syzkaller fuzzer process
TAINT L (SOFTLOCKUP) Indicates a system stall prior to crash
HARDWARE QEMU Standard PC (Q35 + ICH9, 2009) QEMU VM used by syzbot
BIOS 1.16.3-debian-1.16.3-2 04/01/2014
MSGID <69e87e0e.a00a0220.9259.001c.GAE@google.com>
MSGID_URL 69e87e0e.a00a0220.9259.001c.GAE@google.com
SOURCEDIR oops-workdir/linux at commit 6596a02b2078 Exact commit available
VMLINUX Not available locally (syzbot asset would need downloading) Source-only analysis
INTRODUCED-BY f1327abd6abe — “RDMA/rxe: Support RDMA link creation and destruction per net namespace” Introduced race condition

Kernel modules

Module Flags Backtrace Location Flag Implication
(module list not available in this report)

Backtrace

Address Function Offset Size Context Module Source location
(RIP) iput.part.0 +0xa94 0xf50 Task (built-in) fs/inode.c:1980
iput +0x35 0x40 Task (built-in) fs/inode.c:1975
__sock_release (inlined) Task (built-in) net/socket.c:734
sock_release +0x169 0x1c0 Task (built-in) net/socket.c:750
rxe_release_udp_tunnel (inlined) Task (built-in) drivers/infiniband/sw/rxe/rxe_net.c:294
rxe_sock_put +0xae 0x130 Task (built-in) drivers/infiniband/sw/rxe/rxe_net.c:639
rxe_net_del +0x83 0x120 Task (built-in) drivers/infiniband/sw/rxe/rxe_net.c:660
rxe_dellink +0x15 0x20 Task (built-in) drivers/infiniband/sw/rxe/rxe.c:254
nldev_dellink +0x289 0x3c0 Task (built-in) drivers/infiniband/core/nldev.c:1849
rdma_nl_rcv_msg +0x392 0x6f0 Task (built-in) drivers/infiniband/core/netlink.c:195
rdma_nl_rcv_skb.constprop.0.isra.0 +0x2cb 0x410 Task (built-in) drivers/infiniband/core/netlink.c:239
netlink_unicast_kernel (inlined) Task (built-in) net/netlink/af_netlink.c:1318
netlink_unicast +0x585 0x850 Task (built-in) net/netlink/af_netlink.c:1344
netlink_sendmsg +0x8b0 0xda0 Task (built-in) net/netlink/af_netlink.c:1894
sock_sendmsg_nosec (inlined) Task (built-in) net/socket.c:787
__sock_sendmsg (inlined) Task (built-in) net/socket.c:802
____sys_sendmsg +0x9e1 0xb70 Task (built-in) net/socket.c:2698
___sys_sendmsg +0x190 0x1e0 Task (built-in) net/socket.c:2752
__sys_sendmsg +0x170 0x220 Task (built-in) net/socket.c:2784
do_syscall_x64 (inlined) Task (built-in) arch/x86/entry/syscall_64.c:63
do_syscall_64 +0x10b 0xf80 Task (built-in) arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe +0x77 0x7f Task (built-in) arch/x86/entry/common.h

CPU Registers

RIP: 0010:iput.part.0+0xa94/0xf50 fs/inode.c:1980
RSP: 0018:ffffc90005107128    EFLAGS: 00010296
RAX: 0000000000000000  RBX: ffff888059f79900  RCX: 0000000000000000
RDX: 0000000000000000  RSI: ffffffff81250760  RDI: fffff52000a20de7
RBP: 0000000000000300  R12: ffff888059f79860  R13: ffffffff90dc56e4
R14: ffff888059f799d0  R15: dffffc0000000000
CR0: 0000000080050033  CR2: 00007f3056de9f00  CR3: 0000000058d5d000
CR4: 0000000000352ef0

Notable registers:


Code byte line extraction

Code: 88 76 ff 48 c7 c6 60 9a c5 8b 48 89 df e8 74 68 ff ff 90 0f 0b
      e8 ac 88 76 ff 48 c7 c6 40 8f c5 8b 48 89 df e8 5d 68 ff ff 90
      <0f> 0b  e8 95 88 76 ff 48 c7 c6 00 9a c5 8b 48 89 df e8 46 68 ff ff

Decoded (via scripts/decodecode):

  ...
  13:  0f 0b    ud2          ; first BUG() in the sequence
  ...
  2a:* 0f 0b    ud2          ; <-- TRAPPING INSTRUCTION (RIP points here)
  2c:  e8 ...   call ...     ; next statement after the trapping ud2

The trapping instruction is ud2 — the x86 encoding of BUG(). This is the invalid-opcode trap inserted by the VFS_BUG_ON_INODE macro at fs/inode.c:1980. The surrounding pattern (multiple ud2 instructions separated by calls) is characteristic of consecutive VFS_BUG_ON_INODE assertions in iput().


Backtrace source code

1. iput.part.0 — crash site (fs/inode.c:1980)

fs/inode.c at commit 6596a02b2078

1972 void iput(struct inode *inode)
1973 {
1974     might_sleep();
1975     if (unlikely(!inode))
1976         return;
1977
1978 retry:
1979     lockdep_assert_not_held(&inode->i_lock);
1980     VFS_BUG_ON_INODE(inode_state_read_once(inode) & (I_FREEING | I_CLEAR), inode);
         // ← CRASH HERE: inode (RBX=ffff888059f79900) has state 0x300
         //               I_FREEING(0x100)|I_CLEAR(0x200) — already fully freed
         //               i_count=0 — no live references remain
         ...
2010 }

The VFS_BUG_ON_INODE macro expands to a BUG() when compiled with CONFIG_DEBUG_VM (or equivalent), which emits a ud2 instruction. The crash confirms that iput() is being called on an inode that is already in state I_FREEING | I_CLEAR — the inode was freed by a prior call and the reference count is already 0.

2. iputfs/inode.c:1975

fs/inode.c at commit 6596a02b2078

1972 void iput(struct inode *inode)
1973 {
1974     might_sleep();
1975     if (unlikely(!inode))   // ← call here (wrapper iput → iput.part.0)
1976         return;
         ...

The public iput() at +0x35/0x40 is the thin wrapper that dispatches to iput.part.0. The offset +0x35 places execution just past the NULL guard.

3. __sock_releasenet/socket.c:734

net/socket.c at commit 6596a02b2078

713 static void __sock_release(struct socket *sock, struct inode *inode)
714 {
715     const struct proto_ops *ops = READ_ONCE(sock->ops);
716
717     if (ops) {
         ...
722         ops->release(sock);
723         sock->sk = NULL;
         ...
728     }
729
730     if (sock->wq.fasync_list)
731         pr_err("%s: fasync list not empty!\n", __func__);
732
733     if (!sock->file) {
734         iput(SOCK_INODE(sock));   // ← call here — drops inode ref for kernel socket
735         return;
736     }
737     WRITE_ONCE(sock->file, NULL);
738 }
739
740 ...
748 void sock_release(struct socket *sock)
749 {
750     __sock_release(sock, NULL);   // ← sock_release call site
751 }

sock_release() calls __sock_release() with inode=NULL, so line 733 (!sock->file) is TRUE for kernel sockets (they have no file backing), and iput(SOCK_INODE(sock)) is called unconditionally. When the inode has already been freed by a previous sock_release, this second iput triggers the BUG.

4. rxe_sock_putrxe_net.c:639

drivers/infiniband/sw/rxe/rxe_net.c at commit 6596a02b2078

291 static void rxe_release_udp_tunnel(struct socket *sk)
292 {
293     if (sk)
294         udp_tunnel_sock_release(sk);   // ← inlined call at rxe_net.c:294
295 }
296
     ...

632 static void rxe_sock_put(struct sock *sk,
633                         void (*set_sk)(struct net *, struct sock *),
634                         struct net *net)
635 {
636     if (refcount_read(&sk->sk_refcnt) > SK_REF_FOR_TUNNEL) {
637         __sock_put(sk);
638     } else {
639         rxe_release_udp_tunnel(sk->sk_socket);  // ← call here (via inline)
         // → udp_tunnel_sock_release → sock_release → __sock_release → iput
         //   CRASH: inode already I_FREEING|I_CLEAR from prior release
640         sk = NULL;
641         set_sk(net, sk);   // NOTE: pointer is cleared AFTER releasing
642     }
643 }

What-How-Where analysis

What

The kernel BUG at fs/inode.c:1980 is triggered by VFS_BUG_ON_INODE, an assertion that fires when iput() is called on an inode that is already fully freed (I_FREEING | I_CLEAR, state 0x300). The inode belongs to a sockfs socket — the per-network-namespace UDP tunnel socket used by the RXE soft-RoCE driver for encapsulating RDMA traffic.

Register evidence: - RBX = ffff888059f79900 is the struct inode * of the (already freed) sockfs inode. - RBP = 0x300 is the inode state snapshot at the check, confirming I_FREEING|I_CLEAR. - The VFS_BUG_ON_INODE line in the oops header shows state:0x300 count:0, i.e., the inode reference count is already zero — the inode has been completely freed.

In plain terms: iput() is called twice on the same sockfs inode, and the second call happens after the inode has already been freed by the first.

How

Q1: Who released the socket the first time?

A1: rxe_ns_exit() in drivers/infiniband/sw/rxe/rxe_ns.c, the pernet cleanup callback registered via register_pernet_subsys(&rxe_net_ops). When a network namespace is torn down, rxe_ns_exit() reads the stored UDP socket pointer from the per-namespace rxe_ns_sock structure and calls udp_tunnel_sock_release(sk->sk_socket):

 38 static void rxe_ns_exit(struct net *net)
 39 {
 40     struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
 41     struct sock *sk;
 42
 45     rcu_read_lock();
 46     sk = rcu_dereference(ns_sk->rxe_sk4);   // reads pointer
 47     rcu_read_unlock();
 48     if (sk) {
 49         rcu_assign_pointer(ns_sk->rxe_sk4, NULL);
 50         udp_tunnel_sock_release(sk->sk_socket);  // ← first release
 51     }
 ...
 62 }

Q2: Who released the socket the second time (the crash)?

A2: rxe_sock_put() in rxe_net.c, called from rxe_net_del()rxe_dellink()nldev_dellink(), triggered by the user-space sendmsg to the RDMA netlink socket (explicit deletion of the RXE device):

632 static void rxe_sock_put(struct sock *sk,
633                         void (*set_sk)(struct net *, struct sock *),
634                         struct net *net)
635 {
636     if (refcount_read(&sk->sk_refcnt) > SK_REF_FOR_TUNNEL) {
637         __sock_put(sk);
638     } else {
639         rxe_release_udp_tunnel(sk->sk_socket);  // ← second release → CRASH
640         sk = NULL;
641         set_sk(net, sk);   // clears pointer AFTER releasing — too late
642     }
643 }

Q3: How does the race happen?

A3: Both code paths independently read the namespace socket pointer and decide to release the socket, with no synchronisation between them:

CPU 0 (namespace teardown)           CPU 1 (nldev_dellink)
─────────────────────────────────    ──────────────────────────────────
rxe_ns_exit():                       rxe_net_del() → rxe_sock_put():
  rcu_read_lock()                      sk = rxe_ns_pernet_sk4(net)
  sk = rcu_dereference(ns_sk->rxe_sk4)   // reads non-NULL pointer
  rcu_read_unlock()                        ← both see non-NULL sk ←
  rcu_assign_pointer(ns_sk->rxe_sk4, NULL)
  udp_tunnel_sock_release(sk->sk_socket)
    → sock_release → iput            // inode freed, I_FREEING|I_CLEAR set
                                     rxe_release_udp_tunnel(sk->sk_socket)
                                       → sock_release → iput → BUG!

The specific vulnerability is: 1. rxe_ns_exit() sets the namespace pointer to NULL before releasing the socket, but after it has already read the pointer into a local variable. If rxe_net_del reads the pointer during the window between the rcu_read_unlock() and the rcu_assign_pointer(…, NULL) call — or even before either sets it to NULL — both paths have a live copy of the pointer. 2. rxe_sock_put() clears the namespace pointer after releasing the socket (line 641 comes after line 639). This creates an additional window where the socket has already been freed but the namespace pointer still points to it. If rxe_ns_exit() reads the pointer during this window, it gets a dangling pointer to a freed socket and tries to release it again.

There is no locking between the two release paths. The race was introduced by commit f1327abd6abe (March 2026), which added per-network-namespace socket management for RXE devices, together with companion commit 13f2a53c2a71 which added rxe_ns.c/rxe_ns.h.

Where

The fix must ensure that the socket is released at most once, regardless of whether the release comes from rxe_ns_exit() (namespace teardown) or rxe_net_del() (explicit device deletion). The two operations must be serialized.

Proposed fix: add a struct mutex to struct rxe_ns_sock that serializes the read-and-clear of the socket pointer. The actual udp_tunnel_sock_release() can run outside the lock (it sleeps), but the “claim” (read + set-NULL) must be atomic:

--- a/drivers/infiniband/sw/rxe/rxe_ns.c
+++ b/drivers/infiniband/sw/rxe/rxe_ns.c
@@ -15,18 +15,42 @@
 struct rxe_ns_sock {
    struct sock __rcu *rxe_sk4;
    struct sock __rcu *rxe_sk6;
+   struct mutex lock; /* serializes socket claim/release */
 };
 
 static unsigned int rxe_pernet_id;
 
 static int rxe_ns_init(struct net *net)
 {
+   struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
+
+   mutex_init(&ns_sk->lock);
    return 0;
 }
 
+/*
+ * Atomically claim the socket pointer from the namespace: read it and set it
+ * to NULL under the lock, then return the pointer (or NULL if already gone).
+ * The caller is responsible for releasing the socket after the lock is dropped.
+ */
+static struct sock *rxe_ns_claim_sk(struct rxe_ns_sock *ns_sk,
+                   struct sock __rcu **sk_rcu)
+{
+   struct sock *sk;
+
+   mutex_lock(&ns_sk->lock);
+   sk = rcu_dereference_protected(*sk_rcu, lockdep_is_held(&ns_sk->lock));
+   if (sk)
+       rcu_assign_pointer(*sk_rcu, NULL);
+   mutex_unlock(&ns_sk->lock);
+   return sk;
+}
+
 static void rxe_ns_exit(struct net *net)
 {
    struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
    struct sock *sk;
 
-   rcu_read_lock();
-   sk = rcu_dereference(ns_sk->rxe_sk4);
-   rcu_read_unlock();
-   if (sk) {
-       rcu_assign_pointer(ns_sk->rxe_sk4, NULL);
-       udp_tunnel_sock_release(sk->sk_socket);
-   }
+   sk = rxe_ns_claim_sk(ns_sk, &ns_sk->rxe_sk4);
+   if (sk)
+       udp_tunnel_sock_release(sk->sk_socket);
 
 #if IS_ENABLED(CONFIG_IPV6)
-   rcu_read_lock();
-   sk = rcu_dereference(ns_sk->rxe_sk6);
-   rcu_read_unlock();
-   if (sk) {
-       rcu_assign_pointer(ns_sk->rxe_sk6, NULL);
-       udp_tunnel_sock_release(sk->sk_socket);
-   }
+   sk = rxe_ns_claim_sk(ns_sk, &ns_sk->rxe_sk6);
+   if (sk)
+       udp_tunnel_sock_release(sk->sk_socket);
 #endif
 }

Expose the claim helper to rxe_net.c via rxe_ns.h:

--- a/drivers/infiniband/sw/rxe/rxe_ns.h
+++ b/drivers/infiniband/sw/rxe/rxe_ns.h
@@ -5,6 +5,8 @@
 
 struct sock *rxe_ns_pernet_sk4(struct net *net);
 void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk);
+struct sock *rxe_ns_pernet_take_sk4(struct net *net);
+struct sock *rxe_ns_pernet_take_sk6(struct net *net);
 
 #if IS_ENABLED(CONFIG_IPV6)
 void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk);

Add the implementations in rxe_ns.c:

struct sock *rxe_ns_pernet_take_sk4(struct net *net)
{
    struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);

    return rxe_ns_claim_sk(ns_sk, &ns_sk->rxe_sk4);
}

struct sock *rxe_ns_pernet_take_sk6(struct net *net)
{
#if IS_ENABLED(CONFIG_IPV6)
    struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);

    return rxe_ns_claim_sk(ns_sk, &ns_sk->rxe_sk6);
#else
    return NULL;
#endif
}

And update rxe_net_del() to use the atomic-claim helpers, replacing the racy read-then-put pattern:

--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -645,13 +645,14 @@ void rxe_net_del(struct ib_device *dev)
    net = dev_net(ndev);
 
-   sk = rxe_ns_pernet_sk4(net);
+   /* Atomically claim (read + null) the pointer so rxe_ns_exit()
+    * cannot race and double-release the same socket. */
+   sk = rxe_ns_pernet_take_sk4(net);
    if (sk)
        rxe_sock_put(sk, rxe_ns_pernet_set_sk4, net);
 
-   sk = rxe_ns_pernet_sk6(net);
+   sk = rxe_ns_pernet_take_sk6(net);
    if (sk)
        rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);

With this change rxe_sock_put no longer needs to call set_sk(net, NULL) (the pointer was already cleared by the claim helper), but it can be left in place as a harmless no-op (setting NULL to NULL).

Why this is correct: rxe_ns_claim_sk uses the mutex to make the read-and-null operation atomic. Exactly one of the two racing paths — rxe_ns_exit or rxe_net_del — will get a non-NULL pointer back and proceed to release the socket. The other will get NULL and do nothing. After rcu_assign_pointer(…, NULL) inside rxe_ns_claim_sk, the synchronize_rcu() inside udp_tunnel_sock_release (via rcu_assign_sk_user_data) ensures that no concurrent RCU reader can observe the old pointer value afterwards.


Bug introduction

The bug was introduced by the following two commits in the series merged in March 2026:

Primary: f1327abd6abe“RDMA/rxe: Support RDMA link creation and destruction per net namespace” Author: Zhu Yanjun <yanjun.zhu@linux.dev>, Date: 2026-03-12.

This commit introduced rxe_sock_put() and rxe_net_del() with the racy socket release pattern: the namespace socket pointer is not cleared atomically before the socket is released, and there is no coordination with the rxe_ns_exit() path.

Supporting: 13f2a53c2a71“RDMA/rxe: Add net namespace support for IPv4/IPv6 sockets” Author: Zhu Yanjun <yanjun.zhu@linux.dev>, Date: 2026-03-12.

This commit introduced rxe_ns.c with rxe_ns_exit() — the second release path, which reads the socket pointer under an RCU read lock but does not hold any lock that would prevent rxe_net_del from reading the same pointer concurrently.

No upstream fix has been identified for this specific race within the current search budget. The commits that appeared in the search (RDMA/rxe: Fix double free in rxe_srq_from_init) address a different double-free in the SRQ allocation path and do not cover the UDP socket race.


Analysis, conclusions and recommendations

Summary: The crash is a race-condition double-free of a sockfs inode. The UDP tunnel socket shared per-network-namespace by RXE devices can be released twice: once by rxe_ns_exit() (namespace teardown) and once by rxe_sock_put() (explicit RXE device deletion via nldev_dellink). Both code paths independently read the same namespace socket pointer and call sock_release() without any mutual exclusion. The second sock_release()iput() call triggers the VFS_BUG_ON_INODE assertion because the sockfs inode is already in I_FREEING | I_CLEAR state.

Root cause commits: f1327abd6abe and 13f2a53c2a71, both authored by Zhu Yanjun, merged in March 2026.

Confidence: High. The crash site, register values (RBP=0x300), inode state, and the complete call chain are all consistent with a double-sock_release() on the RXE UDP tunnel socket. The introduced commits are the only changes to this code path that could cause this race.

Recommendations: 1. Add a mutex to struct rxe_ns_sock and use it to serialize the read-and-clear of the socket pointer in both rxe_ns_exit() and the rxe_net_del() path (see proposed fix in the Where section above). 2. Report and submit the fix upstream to the RDMA mailing list (linux-rdma@vger.kernel.org), CC: Zhu Yanjun, Leon Romanovsky, David Ahern. 3. Consider whether the SK_REF_FOR_TUNNEL = 2 reference-count boundary in rxe_sock_put is correct: with two devices sharing a socket (refcount=2), 2 > 2 is FALSE and the socket is fully released instead of just dropping one reference. This appears to be a separate off-by-one, but it is not the direct cause of this crash.