Kernel BUG Analysis: rxrpc client connection leak during socket teardown

Key Elements

Field Value Implication
UNAME 6.18.13-200.fc43.x86_64
DISTRO Fedora
DISTRO_VERSION fc43
SOURCEDIR oops-workdir/linux (tag: kernel-6.18.13-0)
VMLINUX oops-workdir/fedora/files/usr/lib/debug/lib/modules/6.18.13-200.fc43.x86_64/vmlinux
BASEDIR oops-workdir/fedora/files/
PROCESS krxrpcio/7001 Kernel RxRPC I/O thread for the client local endpoint at UDP port 7001
HARDWARE VMware, Inc. VMware Virtual Platform
CRASH_TYPE BUG (invalid opcode / ud2) Intentional kernel assertion failure
MSGID <CAPhRvkyZGKHRTBhV3P2PCCRxmRKGEvJQ0W5a9SMW3qwS2hp2Qw@mail.gmail.com>
MSGID_URL CAPhRvkyZGKHRTBhV3P2PCCRxmRKGEvJQ0W5a9SMW3qwS2hp2Qw@mail.gmail.com
CONFIG_REQUIRED (unconditional — fires in all builds) The BUG() in rxrpc_destroy_client_conn_ids() is unconditional
INTRODUCED-BY 9d35d880e0e4 rxrpc: Move client call connection to the I/O thread

Crash Classification

BUG (invalid opcode)kernel BUG at net/rxrpc/conn_client.c:64!

The Oops: invalid opcode: 0000 line confirms the crash was triggered by a ud2 instruction — the x86 encoding of BUG(). This is an intentional assertion failure: the kernel detected an invariant violation (a client connection was left registered in the connection-ID IDR when the local endpoint was being destroyed) and crashed deliberately.


Modules List

Module Flags Backtrace Location Flag Implication
rxrpc Y oops-workdir/fedora/files/usr/lib/debug/lib/modules/6.18.13-200.fc43.x86_64/kernel/net/rxrpc/rxrpc.ko.debug
fcrypt
pcbc
ip6_udp_tunnel
krb5
udp_tunnel
rfkill
vmxnet3
(remaining modules omitted for brevity)

CPU Registers

Register Value
RAX 0x0000000000000000
RBX 0xffff88810a6b4800
RCX 0x0000000000000000
RDX 0x0000000000000000
RSI 0x0000000000000000
RDI 0xffff88810a6b4920
RBP 0xffff888123398000
R8 0xffffc900159cfdb8
R9 0xffff88810a6b4928
R10 0x0000000000000018
R11 0x0000000040000000
R12 0xffff88810a9cda00
R13 0xffff88810a6b4800
R14 0xffffc900159cfe70
R15 0xffff88812d0c2800
RSP 0xffffc900159cfdd8
RIP 0x160d8 (rxrpc_purge_client_connections+0x58)
CR2 0x00007faf20630030
CR3 0x000000000382e002
CR4 0x00000000003706f0
EFLAGS 0x00010246
CS 0x0010

Code Bytes

28 01 00 00 00 74 25 31 c0 48 8d 74 24 0c 48 89
cf 89 44 24 0c 48 89 0c 24 e8 d4 ec c2 c1 48 89
c6 48 85 c0 0f 85 49 dd 01 00 <0f> 0b 31 f6 48
89 cf 48 89 0c 24 e8 c8 aa c4 c1 48 8b 0c 24 85
c0

The <0f> 0b bytes at the RIP are the ud2 instruction — confirming this is a deliberate BUG() assertion.


Backtrace

Address Function Offset Size Context Module Source Location
rxrpc_destroy_client_conn_ids (inlined) Task rxrpc net/rxrpc/conn_client.c:64
0x160d8 (0x16080 + 0x58) rxrpc_purge_client_connections 0x58 0xa0 Task rxrpc net/rxrpc/conn_client.c:145
0x21ab9 (0x219f0 + 0xc9) rxrpc_destroy_local 0xc9 0xe0 Task rxrpc net/rxrpc/local_object.c:451
0x1f3cd (0x1ed70 + 0x65d) rxrpc_io_thread 0x65d 0x750 Task rxrpc net/rxrpc/io_thread.c:598
0xffffffff813f24ec (0xffffffff813f23f0 + 0xfc) kthread 0xfc 0x240 Task vmlinux kernel/kthread.c:463
0xffffffff8132ab54 (0xffffffff8132aa60 + 0xf4) ret_from_fork 0xf4 0x110 Task vmlinux arch/x86/kernel/process.c:158
0xffffffff812d8dca (0xffffffff812d8db0 + 0x1a) ret_from_fork_asm 0x1a 0x30 Task vmlinux arch/x86/entry/entry_64.S:245

Note: ?-marked entries (__pfx_rxrpc_io_thread, __pfx_kthread) excluded per backtrace rules (more than 2 high-confidence entries present).


Source Code

Crash site: rxrpc_destroy_client_conn_ids (inlined into rxrpc_purge_client_connections)

net/rxrpc/conn_client.c @ kernel-6.18.13-0

 54 static void rxrpc_destroy_client_conn_ids(struct rxrpc_local *local)
 55 {
 56     struct rxrpc_connection *conn;
 57     int id;
 58
 59     if (!idr_is_empty(&local->conn_ids)) {
 60         idr_for_each_entry(&local->conn_ids, conn, id) {
 61             pr_err("AF_RXRPC: Leaked client conn %p {%d}\n",
 62                    conn, refcount_read(&conn->ref));
 63         }
 64         BUG();   // <- crash here: conn_ids IDR is not empty at endpoint destruction
 65     }
 66
 67     idr_destroy(&local->conn_ids);
 68 }

net/rxrpc/conn_client.c @ kernel-6.18.13-0

143 void rxrpc_purge_client_connections(struct rxrpc_local *local)
144 {
145     rxrpc_destroy_client_conn_ids(local);   // <- call here
146 }

Caller: rxrpc_destroy_local

net/rxrpc/local_object.c @ kernel-6.18.13-0

420 void rxrpc_destroy_local(struct rxrpc_local *local)
421 {
422     struct socket *socket = local->socket;
423     struct rxrpc_net *rxnet = local->rxnet;
    ...
427     local->dead = true;
    ...
433     rxrpc_clean_up_local_conns(local);       // only cleans idle_client_conns list
434     rxrpc_service_connection_reaper(&rxnet->service_conn_reaper);
435     ASSERT(!local->service);
    ...
450     rxrpc_purge_queue(&local->rx_queue);
451     rxrpc_purge_client_connections(local);   // <- call here -> BUG fires inside
452     page_frag_cache_drain(&local->tx_alloc);
453 }

rxrpc_clean_up_local_conns — the incomplete cleanup

net/rxrpc/conn_client.c @ kernel-6.18.13-0

813 void rxrpc_clean_up_local_conns(struct rxrpc_local *local)
814 {
815     struct rxrpc_connection *conn;
816
817     local->kill_all_client_conns = true;
818
819     timer_delete_sync(&local->client_conn_reap_timer);
820
821     while ((conn = list_first_entry_or_null(&local->idle_client_conns,
822                         struct rxrpc_connection, cache_link))) {
     // Only processes connections on idle_client_conns -- connections
     // in bundles (bundle->conns[]) that have not yet gone idle are missed.
823         list_del_init(&conn->cache_link);
824         atomic_dec(&conn->active);
825         trace_rxrpc_client(conn, -1, rxrpc_client_discard);
826         rxrpc_unbundle_conn(conn);
827         rxrpc_put_connection(conn, rxrpc_conn_put_local_dead);
828     }
829 }

I/O thread exit path

net/rxrpc/io_thread.c @ kernel-6.18.13-0

554     if (!list_empty(&local->new_client_calls))
555         rxrpc_connect_client_calls(local);   // allocates connections, moves calls to bundles
    ...
569     if (should_stop)
570         break;    // exits loop when kthread_should_stop() and queues empty
    ...
596     __set_current_state(TASK_RUNNING);
597     rxrpc_see_local(local, rxrpc_local_stop);
598     rxrpc_destroy_local(local);              // <- call here

What — How — Where Analysis

What

In rxrpc_destroy_client_conn_ids() (inlined into rxrpc_purge_client_connections()), the IDR local->conn_ids is found to be non-empty. The kernel prints:

rxrpc: AF_RXRPC: Leaked client conn 00000000bf02a6a7 {1}

and then fires BUG() at net/rxrpc/conn_client.c:64. The leaked connection has refcount=1, meaning it was allocated but never put. The connection is registered in the conn_ids IDR but was not cleaned up before rxrpc_destroy_local() was called by the I/O thread during socket teardown.

How

When a client calls sendmsg() on an AF_RXRPC socket to initiate a call, the call is placed on local->new_client_calls. The I/O thread picks it up in its main loop at io_thread.c:554–555 via rxrpc_connect_client_calls(). Inside that function, a client connection is allocated via rxrpc_add_conn_to_bundle()rxrpc_alloc_client_connection(). This allocates a rxrpc_connection object with refcount=1 and registers it in local->conn_ids (the IDR). The connection is stored in bundle->conns[slot] and bundle->conn_ids[slot]. At this point the call is moved from new_client_calls to bundle->waiting_calls, and new_client_calls becomes empty.

Now the race: after rxrpc_connect_client_calls() returns, the I/O thread re-evaluates its exit condition (line 558–570). If kthread_should_stop() is true and all work queues (including new_client_calls) appear empty, the thread exits the loop and calls rxrpc_destroy_local().

Inside rxrpc_destroy_local():

  1. rxrpc_clean_up_local_conns() is called. It sets kill_all_client_conns=true and iterates over local->idle_client_conns to free connections that have gone idle. The connection just allocated in step above is NOT on idle_client_conns — it is in the bundle’s conns[] array, waiting to be activated for the pending call. This connection is completely missed by rxrpc_clean_up_local_conns().

  2. The socket is shut down, queues are purged.

  3. rxrpc_purge_client_connections()rxrpc_destroy_client_conn_ids() is called. It finds local->conn_ids non-empty, logs the leaked connection, and fires BUG().

The root cause is a coverage gap in rxrpc_clean_up_local_conns(): it only iterates local->idle_client_conns but does not iterate connections in client bundles (local->client_bundles RB-tree → bundle->conns[]). A connection allocated for a pending call that hasn’t yet been activated on a channel (and thus never went idle) falls through this gap.

This gap was introduced by commit 9d35d880e0e4 (“rxrpc: Move client call connection to the I/O thread”), which moved connection allocation into the I/O thread as part of call setup. Prior to that change, the connection lifecycle was managed differently, and the idle-list cleanup was sufficient. After the change, connections can be in a “bundle-allocated but not yet idle” state that the cleanup path does not handle.

A related fix, fc9de52de38f (“rxrpc: Fix missing locking causing hanging calls”), is already included in kernel 6.18.13. That commit added a missing lock around rxrpc_disconnect_client_call()’s removal of a call from new_client_calls, preventing list corruption. It does not address the idle-list coverage gap described above.

Where

The fix must ensure that when rxrpc_destroy_local() tears down a local endpoint, all client connections registered in local->conn_ids are properly cleaned up — not just those that have reached the idle state.

Two approaches:

  1. Extend rxrpc_clean_up_local_conns() to also iterate over all entries in the local->client_bundles RB-tree, unbundle and put each connection found in bundle->conns[] slots. This mirrors what the idle-list loop does via rxrpc_unbundle_conn() + rxrpc_put_connection().

  2. Abort pending calls before teardown: In rxrpc_destroy_local(), before calling rxrpc_clean_up_local_conns(), abort all calls still in bundle->waiting_calls. When calls are aborted, their disconnect path will properly remove the connection from the bundle (via rxrpc_disconnect_client_call()rxrpc_put_connection()), which will ultimately call rxrpc_kill_client_conn()rxrpc_put_client_connection_id() to remove the connection from conn_ids.

Approach 1 is more direct and lower-risk. A sketch of the change would be to add the following after the idle-list loop in rxrpc_clean_up_local_conns():

/* Also clean up any connections still in bundles (not yet idle). */
spin_lock(&local->client_bundles_lock);
while (!RB_EMPTY_ROOT(&local->client_bundles)) {
    struct rxrpc_bundle *bundle = rb_entry(
        rb_first(&local->client_bundles),
        struct rxrpc_bundle, local_node);
    /* unbundle each slot */
    for (int i = 0; i < RXRPC_MAX_CONNS_PER_CLIENT; i++) {
        conn = bundle->conns[i];
        if (conn) {
            spin_unlock(&local->client_bundles_lock);
            rxrpc_unbundle_conn(conn);
            rxrpc_put_connection(conn, rxrpc_conn_put_local_dead);
            spin_lock(&local->client_bundles_lock);
        }
    }
}
spin_unlock(&local->client_bundles_lock);

The exact implementation should be reviewed by the rxrpc maintainer (David Howells), as additional locking considerations may apply.


Bug Introduction

The bug was introduced by commit 9d35d880e0e4 (“rxrpc: Move client call connection to the I/O thread”, 2022-10-19).

That commit moved connection allocation out of the app-thread sendmsg path and into the I/O thread, creating a new “allocated in bundle, not yet idle” state for connections. The existing rxrpc_clean_up_local_conns() function only handles the idle_client_conns list and was not updated to also cover the new state.

Field Value
INTRODUCED-BY 9d35d880e0e4 rxrpc: Move client call connection to the I/O thread

Searched git history (^kernel-6.18.13-0 origin/master -- net/rxrpc/) for commits that fix the specific coverage gap in rxrpc_clean_up_local_conns(). No commit addressing the non-idle connection cleanup was found as of the search.

The fix fc9de52de38f (“rxrpc: Fix missing locking causing hanging calls”) is already present in kernel-6.18.13-0 and addresses a different (though related) bug in the same code path.

No upstream fix was identified for this specific issue within the search budget.


Analysis, Conclusions, and Recommendations

Conclusion (high confidence): The kernel BUG at net/rxrpc/conn_client.c:64 is triggered when an AF_RXRPC client socket is closed while the I/O thread has already allocated a client connection for a pending call but that connection has not yet been activated on a channel (and therefore never appears on the idle_client_conns list). The rxrpc_clean_up_local_conns() function misses this connection, leaving it registered in the conn_ids IDR, which then trips the BUG assertion in rxrpc_destroy_client_conn_ids().

This is an Unprivileged Application crash: a regular user can trigger it by creating an AF_RXRPC socket and closing it rapidly while the I/O thread is mid-connection-setup. No root privileges are required; the rxrpc_create() path has no capability check. The rxrpc module must be loaded, which is the case on any system running the AFS client (kafs) or where the module has been manually loaded.

Recommendation: The rxrpc maintainer (David Howells) should extend rxrpc_clean_up_local_conns() to also release connections that are stored in bundle->conns[] but have not yet appeared on idle_client_conns, ensuring rxrpc_destroy_client_conn_ids() always finds an empty IDR. The reproducer provided in the bug report reliably triggers the issue and can serve as a regression test.


Security Note

The Linux Kernel CVE team is likely to assign a CVE to this issue (Unprivileged Application crash, no upstream fix identified).


HTML Report

See report.html