Source: syzbot/LKML email report, HEAD commit
6596a02b2078 (Merge tag ‘drm-next-2026-04-22’), upstream
git tree.
Oops-Analysis: http://oops.fenrus.org/reports/email/69e87e0e.a00a0220.9259.001c.GAE_google.com/report.html
| Field | Value | Implication |
|---|---|---|
| UNAME | syzkaller #0 |
Upstream kernel, exact commit 6596a02b2078 |
| PROCESS | syz.6.3284 (PID 17707) |
syzkaller fuzzer process |
| TAINT | L (SOFTLOCKUP) | Indicates a system stall prior to crash |
| HARDWARE | QEMU Standard PC (Q35 + ICH9, 2009) | QEMU VM used by syzbot |
| BIOS | 1.16.3-debian-1.16.3-2 04/01/2014 | |
| MSGID | <69e87e0e.a00a0220.9259.001c.GAE@google.com> |
|
| MSGID_URL | 69e87e0e.a00a0220.9259.001c.GAE@google.com | |
| SOURCEDIR | oops-workdir/linux at commit 6596a02b2078 |
Exact commit available |
| VMLINUX | Not available locally (syzbot asset would need downloading) | Source-only analysis |
| INTRODUCED-BY | f1327abd6abe
— “RDMA/rxe: Support RDMA link creation and destruction per net
namespace” |
Introduced race condition |
| Module | Flags | Backtrace | Location | Flag Implication |
|---|---|---|---|---|
| (module list not available in this report) |
| Address | Function | Offset | Size | Context | Module | Source location |
|---|---|---|---|---|---|---|
| (RIP) | iput.part.0 |
+0xa94 |
0xf50 |
Task | (built-in) | fs/inode.c:1980 |
iput |
+0x35 |
0x40 |
Task | (built-in) | fs/inode.c:1975 | |
__sock_release (inlined) |
Task | (built-in) | net/socket.c:734 | |||
sock_release |
+0x169 |
0x1c0 |
Task | (built-in) | net/socket.c:750 | |
rxe_release_udp_tunnel (inlined) |
Task | (built-in) | drivers/infiniband/sw/rxe/rxe_net.c:294 | |||
rxe_sock_put |
+0xae |
0x130 |
Task | (built-in) | drivers/infiniband/sw/rxe/rxe_net.c:639 | |
rxe_net_del |
+0x83 |
0x120 |
Task | (built-in) | drivers/infiniband/sw/rxe/rxe_net.c:660 | |
rxe_dellink |
+0x15 |
0x20 |
Task | (built-in) | drivers/infiniband/sw/rxe/rxe.c:254 | |
nldev_dellink |
+0x289 |
0x3c0 |
Task | (built-in) | drivers/infiniband/core/nldev.c:1849 | |
rdma_nl_rcv_msg |
+0x392 |
0x6f0 |
Task | (built-in) | drivers/infiniband/core/netlink.c:195 | |
rdma_nl_rcv_skb.constprop.0.isra.0 |
+0x2cb |
0x410 |
Task | (built-in) | drivers/infiniband/core/netlink.c:239 | |
netlink_unicast_kernel (inlined) |
Task | (built-in) | net/netlink/af_netlink.c:1318 | |||
netlink_unicast |
+0x585 |
0x850 |
Task | (built-in) | net/netlink/af_netlink.c:1344 | |
netlink_sendmsg |
+0x8b0 |
0xda0 |
Task | (built-in) | net/netlink/af_netlink.c:1894 | |
sock_sendmsg_nosec (inlined) |
Task | (built-in) | net/socket.c:787 | |||
__sock_sendmsg (inlined) |
Task | (built-in) | net/socket.c:802 | |||
____sys_sendmsg |
+0x9e1 |
0xb70 |
Task | (built-in) | net/socket.c:2698 | |
___sys_sendmsg |
+0x190 |
0x1e0 |
Task | (built-in) | net/socket.c:2752 | |
__sys_sendmsg |
+0x170 |
0x220 |
Task | (built-in) | net/socket.c:2784 | |
do_syscall_x64 (inlined) |
Task | (built-in) | arch/x86/entry/syscall_64.c:63 | |||
do_syscall_64 |
+0x10b |
0xf80 |
Task | (built-in) | arch/x86/entry/syscall_64.c:94 | |
entry_SYSCALL_64_after_hwframe |
+0x77 |
0x7f |
Task | (built-in) | arch/x86/entry/common.h |
RIP: 0010:iput.part.0+0xa94/0xf50 fs/inode.c:1980
RSP: 0018:ffffc90005107128 EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffff888059f79900 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff81250760 RDI: fffff52000a20de7
RBP: 0000000000000300 R12: ffff888059f79860 R13: ffffffff90dc56e4
R14: ffff888059f799d0 R15: dffffc0000000000
CR0: 0000000080050033 CR2: 00007f3056de9f00 CR3: 0000000058d5d000
CR4: 0000000000352ef0
Notable registers:
ffff888059f79900 — the
struct inode * passed into iput(), confirmed
by the VFS_BUG_ON_INODE output
(inode:ffff888059f79900).0x300 — holds the inode state at
the BUG_ON check site.
0x300 = I_FREEING (0x100) | I_CLEAR (0x200), meaning the
inode has already been fully freed. This directly confirms the
double-free.ffff888059f79860 —
inode - 0xa0 = socket or related sockfs struct (offset to
struct socket embedded in sockfs inode).Code: 88 76 ff 48 c7 c6 60 9a c5 8b 48 89 df e8 74 68 ff ff 90 0f 0b
e8 ac 88 76 ff 48 c7 c6 40 8f c5 8b 48 89 df e8 5d 68 ff ff 90
<0f> 0b e8 95 88 76 ff 48 c7 c6 00 9a c5 8b 48 89 df e8 46 68 ff ff
Decoded (via scripts/decodecode):
...
13: 0f 0b ud2 ; first BUG() in the sequence
...
2a:* 0f 0b ud2 ; <-- TRAPPING INSTRUCTION (RIP points here)
2c: e8 ... call ... ; next statement after the trapping ud2
The trapping instruction is ud2 — the x86 encoding of
BUG(). This is the invalid-opcode trap inserted by the
VFS_BUG_ON_INODE macro at fs/inode.c:1980. The
surrounding pattern (multiple ud2 instructions separated by
calls) is characteristic of consecutive
VFS_BUG_ON_INODE assertions in iput().
iput.part.0 — crash site
(fs/inode.c:1980)fs/inode.c at commit 6596a02b2078
1972 void iput(struct inode *inode)
1973 {
1974 might_sleep();
1975 if (unlikely(!inode))
1976 return;
1977
1978 retry:
1979 lockdep_assert_not_held(&inode->i_lock);
1980 VFS_BUG_ON_INODE(inode_state_read_once(inode) & (I_FREEING | I_CLEAR), inode);
// ← CRASH HERE: inode (RBX=ffff888059f79900) has state 0x300
// I_FREEING(0x100)|I_CLEAR(0x200) — already fully freed
// i_count=0 — no live references remain
...
2010 }The VFS_BUG_ON_INODE macro expands to a
BUG() when compiled with CONFIG_DEBUG_VM (or
equivalent), which emits a ud2 instruction. The crash
confirms that iput() is being called on an inode that is
already in state I_FREEING | I_CLEAR — the inode was freed
by a prior call and the reference count is already 0.
iput —
fs/inode.c:1975fs/inode.c at commit 6596a02b2078
1972 void iput(struct inode *inode)
1973 {
1974 might_sleep();
1975 if (unlikely(!inode)) // ← call here (wrapper iput → iput.part.0)
1976 return;
...The public iput() at +0x35/0x40 is the thin
wrapper that dispatches to iput.part.0. The offset
+0x35 places execution just past the NULL guard.
__sock_release —
net/socket.c:734net/socket.c at commit 6596a02b2078
713 static void __sock_release(struct socket *sock, struct inode *inode)
714 {
715 const struct proto_ops *ops = READ_ONCE(sock->ops);
716
717 if (ops) {
...
722 ops->release(sock);
723 sock->sk = NULL;
...
728 }
729
730 if (sock->wq.fasync_list)
731 pr_err("%s: fasync list not empty!\n", __func__);
732
733 if (!sock->file) {
734 iput(SOCK_INODE(sock)); // ← call here — drops inode ref for kernel socket
735 return;
736 }
737 WRITE_ONCE(sock->file, NULL);
738 }
739
740 ...
748 void sock_release(struct socket *sock)
749 {
750 __sock_release(sock, NULL); // ← sock_release call site
751 }sock_release() calls __sock_release() with
inode=NULL, so line 733 (!sock->file) is
TRUE for kernel sockets (they have no file backing), and
iput(SOCK_INODE(sock)) is called unconditionally. When the
inode has already been freed by a previous sock_release,
this second iput triggers the BUG.
rxe_sock_put —
rxe_net.c:639drivers/infiniband/sw/rxe/rxe_net.c at commit 6596a02b2078
291 static void rxe_release_udp_tunnel(struct socket *sk)
292 {
293 if (sk)
294 udp_tunnel_sock_release(sk); // ← inlined call at rxe_net.c:294
295 }
296
...
632 static void rxe_sock_put(struct sock *sk,
633 void (*set_sk)(struct net *, struct sock *),
634 struct net *net)
635 {
636 if (refcount_read(&sk->sk_refcnt) > SK_REF_FOR_TUNNEL) {
637 __sock_put(sk);
638 } else {
639 rxe_release_udp_tunnel(sk->sk_socket); // ← call here (via inline)
// → udp_tunnel_sock_release → sock_release → __sock_release → iput
// CRASH: inode already I_FREEING|I_CLEAR from prior release
640 sk = NULL;
641 set_sk(net, sk); // NOTE: pointer is cleared AFTER releasing
642 }
643 }The kernel BUG at fs/inode.c:1980 is triggered by
VFS_BUG_ON_INODE, an assertion that fires when
iput() is called on an inode that is already fully freed
(I_FREEING | I_CLEAR, state 0x300). The inode
belongs to a sockfs socket — the per-network-namespace UDP tunnel socket
used by the RXE soft-RoCE driver for encapsulating RDMA traffic.
Register evidence: - RBX =
ffff888059f79900 is the
struct inode * of the (already freed) sockfs inode. -
RBP = 0x300 is the inode state snapshot at
the check, confirming I_FREEING|I_CLEAR. - The
VFS_BUG_ON_INODE line in the oops header shows
state:0x300 count:0, i.e., the inode reference count is
already zero — the inode has been completely freed.
In plain terms: iput() is called twice on the same
sockfs inode, and the second call happens after the inode has already
been freed by the first.
Q1: Who released the socket the first time?
A1: rxe_ns_exit() in
drivers/infiniband/sw/rxe/rxe_ns.c, the pernet cleanup
callback registered via
register_pernet_subsys(&rxe_net_ops). When a network
namespace is torn down, rxe_ns_exit() reads the stored UDP
socket pointer from the per-namespace rxe_ns_sock structure
and calls udp_tunnel_sock_release(sk->sk_socket):
38 static void rxe_ns_exit(struct net *net)
39 {
40 struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
41 struct sock *sk;
42
45 rcu_read_lock();
46 sk = rcu_dereference(ns_sk->rxe_sk4); // reads pointer
47 rcu_read_unlock();
48 if (sk) {
49 rcu_assign_pointer(ns_sk->rxe_sk4, NULL);
50 udp_tunnel_sock_release(sk->sk_socket); // ← first release
51 }
...
62 }Q2: Who released the socket the second time (the crash)?
A2: rxe_sock_put() in rxe_net.c, called
from rxe_net_del() → rxe_dellink() →
nldev_dellink(), triggered by the user-space
sendmsg to the RDMA netlink socket (explicit deletion of
the RXE device):
632 static void rxe_sock_put(struct sock *sk,
633 void (*set_sk)(struct net *, struct sock *),
634 struct net *net)
635 {
636 if (refcount_read(&sk->sk_refcnt) > SK_REF_FOR_TUNNEL) {
637 __sock_put(sk);
638 } else {
639 rxe_release_udp_tunnel(sk->sk_socket); // ← second release → CRASH
640 sk = NULL;
641 set_sk(net, sk); // clears pointer AFTER releasing — too late
642 }
643 }Q3: How does the race happen?
A3: Both code paths independently read the namespace socket pointer and decide to release the socket, with no synchronisation between them:
CPU 0 (namespace teardown) CPU 1 (nldev_dellink)
───────────────────────────────── ──────────────────────────────────
rxe_ns_exit(): rxe_net_del() → rxe_sock_put():
rcu_read_lock() sk = rxe_ns_pernet_sk4(net)
sk = rcu_dereference(ns_sk->rxe_sk4) // reads non-NULL pointer
rcu_read_unlock() ← both see non-NULL sk ←
rcu_assign_pointer(ns_sk->rxe_sk4, NULL)
udp_tunnel_sock_release(sk->sk_socket)
→ sock_release → iput // inode freed, I_FREEING|I_CLEAR set
rxe_release_udp_tunnel(sk->sk_socket)
→ sock_release → iput → BUG!
The specific vulnerability is: 1. rxe_ns_exit()
sets the namespace pointer to NULL before releasing the
socket, but after it has already read the pointer into a local
variable. If rxe_net_del reads the pointer during the
window between the rcu_read_unlock() and the
rcu_assign_pointer(…, NULL) call — or even before either
sets it to NULL — both paths have a live copy of the pointer. 2.
rxe_sock_put() clears the namespace pointer
after releasing the socket (line 641 comes after line
639). This creates an additional window where the socket has already
been freed but the namespace pointer still points to it. If
rxe_ns_exit() reads the pointer during this window, it gets
a dangling pointer to a freed socket and tries to release it again.
There is no locking between the two release paths. The race was
introduced by commit f1327abd6abe
(March 2026), which added per-network-namespace socket management for
RXE devices, together with companion commit 13f2a53c2a71
which added rxe_ns.c/rxe_ns.h.
The fix must ensure that the socket is released at most once,
regardless of whether the release comes from rxe_ns_exit()
(namespace teardown) or rxe_net_del() (explicit device
deletion). The two operations must be serialized.
Proposed fix: add a struct mutex to
struct rxe_ns_sock that serializes the read-and-clear of
the socket pointer. The actual udp_tunnel_sock_release()
can run outside the lock (it sleeps), but the “claim” (read + set-NULL)
must be atomic:
--- a/drivers/infiniband/sw/rxe/rxe_ns.c
+++ b/drivers/infiniband/sw/rxe/rxe_ns.c
@@ -15,18 +15,42 @@
struct rxe_ns_sock {
struct sock __rcu *rxe_sk4;
struct sock __rcu *rxe_sk6;
+ struct mutex lock; /* serializes socket claim/release */
};
static unsigned int rxe_pernet_id;
static int rxe_ns_init(struct net *net)
{
+ struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
+
+ mutex_init(&ns_sk->lock);
return 0;
}
+/*
+ * Atomically claim the socket pointer from the namespace: read it and set it
+ * to NULL under the lock, then return the pointer (or NULL if already gone).
+ * The caller is responsible for releasing the socket after the lock is dropped.
+ */
+static struct sock *rxe_ns_claim_sk(struct rxe_ns_sock *ns_sk,
+ struct sock __rcu **sk_rcu)
+{
+ struct sock *sk;
+
+ mutex_lock(&ns_sk->lock);
+ sk = rcu_dereference_protected(*sk_rcu, lockdep_is_held(&ns_sk->lock));
+ if (sk)
+ rcu_assign_pointer(*sk_rcu, NULL);
+ mutex_unlock(&ns_sk->lock);
+ return sk;
+}
+
static void rxe_ns_exit(struct net *net)
{
struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
struct sock *sk;
- rcu_read_lock();
- sk = rcu_dereference(ns_sk->rxe_sk4);
- rcu_read_unlock();
- if (sk) {
- rcu_assign_pointer(ns_sk->rxe_sk4, NULL);
- udp_tunnel_sock_release(sk->sk_socket);
- }
+ sk = rxe_ns_claim_sk(ns_sk, &ns_sk->rxe_sk4);
+ if (sk)
+ udp_tunnel_sock_release(sk->sk_socket);
#if IS_ENABLED(CONFIG_IPV6)
- rcu_read_lock();
- sk = rcu_dereference(ns_sk->rxe_sk6);
- rcu_read_unlock();
- if (sk) {
- rcu_assign_pointer(ns_sk->rxe_sk6, NULL);
- udp_tunnel_sock_release(sk->sk_socket);
- }
+ sk = rxe_ns_claim_sk(ns_sk, &ns_sk->rxe_sk6);
+ if (sk)
+ udp_tunnel_sock_release(sk->sk_socket);
#endif
}Expose the claim helper to rxe_net.c via
rxe_ns.h:
--- a/drivers/infiniband/sw/rxe/rxe_ns.h
+++ b/drivers/infiniband/sw/rxe/rxe_ns.h
@@ -5,6 +5,8 @@
struct sock *rxe_ns_pernet_sk4(struct net *net);
void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk);
+struct sock *rxe_ns_pernet_take_sk4(struct net *net);
+struct sock *rxe_ns_pernet_take_sk6(struct net *net);
#if IS_ENABLED(CONFIG_IPV6)
void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk);Add the implementations in rxe_ns.c:
struct sock *rxe_ns_pernet_take_sk4(struct net *net)
{
struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
return rxe_ns_claim_sk(ns_sk, &ns_sk->rxe_sk4);
}
struct sock *rxe_ns_pernet_take_sk6(struct net *net)
{
#if IS_ENABLED(CONFIG_IPV6)
struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
return rxe_ns_claim_sk(ns_sk, &ns_sk->rxe_sk6);
#else
return NULL;
#endif
}And update rxe_net_del() to use the atomic-claim
helpers, replacing the racy read-then-put pattern:
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -645,13 +645,14 @@ void rxe_net_del(struct ib_device *dev)
net = dev_net(ndev);
- sk = rxe_ns_pernet_sk4(net);
+ /* Atomically claim (read + null) the pointer so rxe_ns_exit()
+ * cannot race and double-release the same socket. */
+ sk = rxe_ns_pernet_take_sk4(net);
if (sk)
rxe_sock_put(sk, rxe_ns_pernet_set_sk4, net);
- sk = rxe_ns_pernet_sk6(net);
+ sk = rxe_ns_pernet_take_sk6(net);
if (sk)
rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);With this change rxe_sock_put no longer needs to call
set_sk(net, NULL) (the pointer was already cleared by the
claim helper), but it can be left in place as a harmless no-op (setting
NULL to NULL).
Why this is correct: rxe_ns_claim_sk
uses the mutex to make the read-and-null operation atomic. Exactly one
of the two racing paths — rxe_ns_exit or
rxe_net_del — will get a non-NULL pointer back and proceed
to release the socket. The other will get NULL and do nothing. After
rcu_assign_pointer(…, NULL) inside
rxe_ns_claim_sk, the synchronize_rcu() inside
udp_tunnel_sock_release (via
rcu_assign_sk_user_data) ensures that no concurrent RCU
reader can observe the old pointer value afterwards.
The bug was introduced by the following two commits in the series merged in March 2026:
Primary: f1327abd6abe
— “RDMA/rxe: Support RDMA link creation and destruction per net
namespace” Author: Zhu Yanjun
<yanjun.zhu@linux.dev>, Date: 2026-03-12.
This commit introduced rxe_sock_put() and
rxe_net_del() with the racy socket release pattern: the
namespace socket pointer is not cleared atomically before the socket is
released, and there is no coordination with the
rxe_ns_exit() path.
Supporting: 13f2a53c2a71
— “RDMA/rxe: Add net namespace support for IPv4/IPv6 sockets”
Author: Zhu Yanjun <yanjun.zhu@linux.dev>, Date:
2026-03-12.
This commit introduced rxe_ns.c with
rxe_ns_exit() — the second release path, which reads the
socket pointer under an RCU read lock but does not hold any lock that
would prevent rxe_net_del from reading the same pointer
concurrently.
No upstream fix has been identified for this
specific race within the current search budget. The commits that
appeared in the search
(RDMA/rxe: Fix double free in rxe_srq_from_init) address a
different double-free in the SRQ allocation path and do not cover the
UDP socket race.
Summary: The crash is a race-condition double-free
of a sockfs inode. The UDP tunnel socket shared per-network-namespace by
RXE devices can be released twice: once by rxe_ns_exit()
(namespace teardown) and once by rxe_sock_put() (explicit
RXE device deletion via nldev_dellink). Both code paths
independently read the same namespace socket pointer and call
sock_release() without any mutual exclusion. The second
sock_release() → iput() call triggers the
VFS_BUG_ON_INODE assertion because the sockfs inode is
already in I_FREEING | I_CLEAR state.
Root cause commits: f1327abd6abe
and 13f2a53c2a71,
both authored by Zhu Yanjun, merged in March 2026.
Confidence: High. The crash site, register values
(RBP=0x300), inode state, and the complete call chain are all consistent
with a double-sock_release() on the RXE UDP tunnel socket.
The introduced commits are the only changes to this code path that could
cause this race.
Recommendations: 1. Add a mutex to
struct rxe_ns_sock and use it to serialize the
read-and-clear of the socket pointer in both rxe_ns_exit()
and the rxe_net_del() path (see proposed fix in the
Where section above). 2. Report and submit the fix
upstream to the RDMA mailing list (linux-rdma@vger.kernel.org), CC: Zhu
Yanjun, Leon Romanovsky, David Ahern. 3. Consider whether the
SK_REF_FOR_TUNNEL = 2 reference-count boundary in
rxe_sock_put is correct: with two devices sharing a socket
(refcount=2), 2 > 2 is FALSE and the socket is fully
released instead of just dropping one reference. This appears to be a
separate off-by-one, but it is not the direct cause of this crash.