# Kernel BUG Analysis: rxrpc client connection leak during socket teardown ## Key Elements | Field | Value | Implication | | ----- | ----- | ----------- | | UNAME | `6.18.13-200.fc43.x86_64` | | | DISTRO | Fedora | | | DISTRO_VERSION | fc43 | | | SOURCEDIR | `oops-workdir/linux` (tag: `kernel-6.18.13-0`) | | | VMLINUX | `oops-workdir/fedora/files/usr/lib/debug/lib/modules/6.18.13-200.fc43.x86_64/vmlinux` | | | BASEDIR | `oops-workdir/fedora/files/` | | | PROCESS | `krxrpcio/7001` | Kernel RxRPC I/O thread for the client local endpoint at UDP port 7001 | | HARDWARE | `VMware, Inc. VMware Virtual Platform` | | | CRASH_TYPE | BUG (invalid opcode / ud2) | Intentional kernel assertion failure | | MSGID | `` | | | MSGID_URL | [CAPhRvkyZGKHRTBhV3P2PCCRxmRKGEvJQ0W5a9SMW3qwS2hp2Qw@mail.gmail.com](https://lore.kernel.org/r/CAPhRvkyZGKHRTBhV3P2PCCRxmRKGEvJQ0W5a9SMW3qwS2hp2Qw@mail.gmail.com) | | | CONFIG_REQUIRED | `(unconditional — fires in all builds)` | The `BUG()` in `rxrpc_destroy_client_conn_ids()` is unconditional | | INTRODUCED-BY | [`9d35d880e0e4`](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9d35d880e0e4a3ab32d8c12f9e4d76198aadd42d) rxrpc: Move client call connection to the I/O thread | | --- ## Crash Classification **BUG (invalid opcode)** — `kernel BUG at net/rxrpc/conn_client.c:64!` The `Oops: invalid opcode: 0000` line confirms the crash was triggered by a `ud2` instruction — the x86 encoding of `BUG()`. This is an intentional assertion failure: the kernel detected an invariant violation (a client connection was left registered in the connection-ID IDR when the local endpoint was being destroyed) and crashed deliberately. --- ## Modules List | Module | Flags | Backtrace | Location | Flag Implication | | ------ | ----- | --------- | -------- | ---------------- | | rxrpc | | Y | `oops-workdir/fedora/files/usr/lib/debug/lib/modules/6.18.13-200.fc43.x86_64/kernel/net/rxrpc/rxrpc.ko.debug` | | | fcrypt | | | | | | pcbc | | | | | | ip6_udp_tunnel | | | | | | krb5 | | | | | | udp_tunnel | | | | | | rfkill | | | | | | vmxnet3 | | | | | | *(remaining modules omitted for brevity)* | | | | | --- ## CPU Registers | Register | Value | | -------- | ----- | | RAX | `0x0000000000000000` | | RBX | `0xffff88810a6b4800` | | RCX | `0x0000000000000000` | | RDX | `0x0000000000000000` | | RSI | `0x0000000000000000` | | RDI | `0xffff88810a6b4920` | | RBP | `0xffff888123398000` | | R8 | `0xffffc900159cfdb8` | | R9 | `0xffff88810a6b4928` | | R10 | `0x0000000000000018` | | R11 | `0x0000000040000000` | | R12 | `0xffff88810a9cda00` | | R13 | `0xffff88810a6b4800` | | R14 | `0xffffc900159cfe70` | | R15 | `0xffff88812d0c2800` | | RSP | `0xffffc900159cfdd8` | | RIP | `0x160d8` (rxrpc_purge_client_connections+0x58) | | CR2 | `0x00007faf20630030` | | CR3 | `0x000000000382e002` | | CR4 | `0x00000000003706f0` | | EFLAGS | `0x00010246` | | CS | `0x0010` | --- ## Code Bytes ``` 28 01 00 00 00 74 25 31 c0 48 8d 74 24 0c 48 89 cf 89 44 24 0c 48 89 0c 24 e8 d4 ec c2 c1 48 89 c6 48 85 c0 0f 85 49 dd 01 00 <0f> 0b 31 f6 48 89 cf 48 89 0c 24 e8 c8 aa c4 c1 48 8b 0c 24 85 c0 ``` The `<0f> 0b` bytes at the RIP are the `ud2` instruction — confirming this is a deliberate `BUG()` assertion. --- ## Backtrace | Address | Function | Offset | Size | Context | Module | Source Location | | ------- | -------- | ------ | ---- | ------- | ------ | --------------- | | | `rxrpc_destroy_client_conn_ids` (inlined) | | | Task | rxrpc | `net/rxrpc/conn_client.c:64` | | `0x160d8 (0x16080 + 0x58)` | `rxrpc_purge_client_connections` | `0x58` | `0xa0` | Task | rxrpc | `net/rxrpc/conn_client.c:145` | | `0x21ab9 (0x219f0 + 0xc9)` | `rxrpc_destroy_local` | `0xc9` | `0xe0` | Task | rxrpc | `net/rxrpc/local_object.c:451` | | `0x1f3cd (0x1ed70 + 0x65d)` | `rxrpc_io_thread` | `0x65d` | `0x750` | Task | rxrpc | `net/rxrpc/io_thread.c:598` | | `0xffffffff813f24ec (0xffffffff813f23f0 + 0xfc)` | `kthread` | `0xfc` | `0x240` | Task | vmlinux | `kernel/kthread.c:463` | | `0xffffffff8132ab54 (0xffffffff8132aa60 + 0xf4)` | `ret_from_fork` | `0xf4` | `0x110` | Task | vmlinux | `arch/x86/kernel/process.c:158` | | `0xffffffff812d8dca (0xffffffff812d8db0 + 0x1a)` | `ret_from_fork_asm` | `0x1a` | `0x30` | Task | vmlinux | `arch/x86/entry/entry_64.S:245` | Note: `?`-marked entries (`__pfx_rxrpc_io_thread`, `__pfx_kthread`) excluded per backtrace rules (more than 2 high-confidence entries present). --- ## Source Code ### Crash site: `rxrpc_destroy_client_conn_ids` (inlined into `rxrpc_purge_client_connections`) [`net/rxrpc/conn_client.c` @ `kernel-6.18.13-0`](https://gitlab.com/cki-project/kernel-ark/-/blob/kernel-6.18.13-0/net/rxrpc/conn_client.c?ref_type=tags#L54) ```c 54 static void rxrpc_destroy_client_conn_ids(struct rxrpc_local *local) 55 { 56 struct rxrpc_connection *conn; 57 int id; 58 59 if (!idr_is_empty(&local->conn_ids)) { 60 idr_for_each_entry(&local->conn_ids, conn, id) { 61 pr_err("AF_RXRPC: Leaked client conn %p {%d}\n", 62 conn, refcount_read(&conn->ref)); 63 } 64 BUG(); // <- crash here: conn_ids IDR is not empty at endpoint destruction 65 } 66 67 idr_destroy(&local->conn_ids); 68 } ``` [`net/rxrpc/conn_client.c` @ `kernel-6.18.13-0`](https://gitlab.com/cki-project/kernel-ark/-/blob/kernel-6.18.13-0/net/rxrpc/conn_client.c?ref_type=tags#L143) ```c 143 void rxrpc_purge_client_connections(struct rxrpc_local *local) 144 { 145 rxrpc_destroy_client_conn_ids(local); // <- call here 146 } ``` ### Caller: `rxrpc_destroy_local` [`net/rxrpc/local_object.c` @ `kernel-6.18.13-0`](https://gitlab.com/cki-project/kernel-ark/-/blob/kernel-6.18.13-0/net/rxrpc/local_object.c?ref_type=tags#L420) ```c 420 void rxrpc_destroy_local(struct rxrpc_local *local) 421 { 422 struct socket *socket = local->socket; 423 struct rxrpc_net *rxnet = local->rxnet; ... 427 local->dead = true; ... 433 rxrpc_clean_up_local_conns(local); // only cleans idle_client_conns list 434 rxrpc_service_connection_reaper(&rxnet->service_conn_reaper); 435 ASSERT(!local->service); ... 450 rxrpc_purge_queue(&local->rx_queue); 451 rxrpc_purge_client_connections(local); // <- call here -> BUG fires inside 452 page_frag_cache_drain(&local->tx_alloc); 453 } ``` ### `rxrpc_clean_up_local_conns` — the incomplete cleanup [`net/rxrpc/conn_client.c` @ `kernel-6.18.13-0`](https://gitlab.com/cki-project/kernel-ark/-/blob/kernel-6.18.13-0/net/rxrpc/conn_client.c?ref_type=tags#L813) ```c 813 void rxrpc_clean_up_local_conns(struct rxrpc_local *local) 814 { 815 struct rxrpc_connection *conn; 816 817 local->kill_all_client_conns = true; 818 819 timer_delete_sync(&local->client_conn_reap_timer); 820 821 while ((conn = list_first_entry_or_null(&local->idle_client_conns, 822 struct rxrpc_connection, cache_link))) { // Only processes connections on idle_client_conns -- connections // in bundles (bundle->conns[]) that have not yet gone idle are missed. 823 list_del_init(&conn->cache_link); 824 atomic_dec(&conn->active); 825 trace_rxrpc_client(conn, -1, rxrpc_client_discard); 826 rxrpc_unbundle_conn(conn); 827 rxrpc_put_connection(conn, rxrpc_conn_put_local_dead); 828 } 829 } ``` ### I/O thread exit path [`net/rxrpc/io_thread.c` @ `kernel-6.18.13-0`](https://gitlab.com/cki-project/kernel-ark/-/blob/kernel-6.18.13-0/net/rxrpc/io_thread.c?ref_type=tags#L554) ```c 554 if (!list_empty(&local->new_client_calls)) 555 rxrpc_connect_client_calls(local); // allocates connections, moves calls to bundles ... 569 if (should_stop) 570 break; // exits loop when kthread_should_stop() and queues empty ... 596 __set_current_state(TASK_RUNNING); 597 rxrpc_see_local(local, rxrpc_local_stop); 598 rxrpc_destroy_local(local); // <- call here ``` --- ## What — How — Where Analysis ### What In `rxrpc_destroy_client_conn_ids()` (inlined into `rxrpc_purge_client_connections()`), the IDR `local->conn_ids` is found to be non-empty. The kernel prints: ``` rxrpc: AF_RXRPC: Leaked client conn 00000000bf02a6a7 {1} ``` and then fires `BUG()` at `net/rxrpc/conn_client.c:64`. The leaked connection has refcount=1, meaning it was allocated but never put. The connection is registered in the `conn_ids` IDR but was not cleaned up before `rxrpc_destroy_local()` was called by the I/O thread during socket teardown. ### How When a client calls `sendmsg()` on an `AF_RXRPC` socket to initiate a call, the call is placed on `local->new_client_calls`. The I/O thread picks it up in its main loop at `io_thread.c:554–555` via `rxrpc_connect_client_calls()`. Inside that function, a client connection is allocated via `rxrpc_add_conn_to_bundle()` → `rxrpc_alloc_client_connection()`. This allocates a `rxrpc_connection` object with `refcount=1` and registers it in `local->conn_ids` (the IDR). The connection is stored in `bundle->conns[slot]` and `bundle->conn_ids[slot]`. At this point the call is moved from `new_client_calls` to `bundle->waiting_calls`, and `new_client_calls` becomes empty. Now the race: after `rxrpc_connect_client_calls()` returns, the I/O thread re-evaluates its exit condition (line 558–570). If `kthread_should_stop()` is true and all work queues (including `new_client_calls`) appear empty, the thread exits the loop and calls `rxrpc_destroy_local()`. Inside `rxrpc_destroy_local()`: 1. `rxrpc_clean_up_local_conns()` is called. It sets `kill_all_client_conns=true` and iterates over `local->idle_client_conns` to free connections that have gone idle. **The connection just allocated in step above is NOT on `idle_client_conns`** — it is in the bundle's `conns[]` array, waiting to be activated for the pending call. This connection is completely missed by `rxrpc_clean_up_local_conns()`. 2. The socket is shut down, queues are purged. 3. `rxrpc_purge_client_connections()` → `rxrpc_destroy_client_conn_ids()` is called. It finds `local->conn_ids` non-empty, logs the leaked connection, and fires `BUG()`. The root cause is a coverage gap in `rxrpc_clean_up_local_conns()`: it only iterates `local->idle_client_conns` but does not iterate connections in client bundles (`local->client_bundles` RB-tree → `bundle->conns[]`). A connection allocated for a pending call that hasn't yet been activated on a channel (and thus never went idle) falls through this gap. This gap was introduced by commit [`9d35d880e0e4`](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9d35d880e0e4a3ab32d8c12f9e4d76198aadd42d) ("rxrpc: Move client call connection to the I/O thread"), which moved connection allocation into the I/O thread as part of call setup. Prior to that change, the connection lifecycle was managed differently, and the idle-list cleanup was sufficient. After the change, connections can be in a "bundle-allocated but not yet idle" state that the cleanup path does not handle. A related fix, [`fc9de52de38f`](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fc9de52de38f656399d2ce40f7349a6b5f86e787) ("rxrpc: Fix missing locking causing hanging calls"), is already included in kernel 6.18.13. That commit added a missing lock around `rxrpc_disconnect_client_call()`'s removal of a call from `new_client_calls`, preventing list corruption. It does not address the idle-list coverage gap described above. ### Where The fix must ensure that when `rxrpc_destroy_local()` tears down a local endpoint, **all** client connections registered in `local->conn_ids` are properly cleaned up — not just those that have reached the idle state. Two approaches: 1. **Extend `rxrpc_clean_up_local_conns()`** to also iterate over all entries in the `local->client_bundles` RB-tree, unbundle and put each connection found in `bundle->conns[]` slots. This mirrors what the idle-list loop does via `rxrpc_unbundle_conn()` + `rxrpc_put_connection()`. 2. **Abort pending calls before teardown**: In `rxrpc_destroy_local()`, before calling `rxrpc_clean_up_local_conns()`, abort all calls still in `bundle->waiting_calls`. When calls are aborted, their disconnect path will properly remove the connection from the bundle (via `rxrpc_disconnect_client_call()` → `rxrpc_put_connection()`), which will ultimately call `rxrpc_kill_client_conn()` → `rxrpc_put_client_connection_id()` to remove the connection from `conn_ids`. Approach 1 is more direct and lower-risk. A sketch of the change would be to add the following after the idle-list loop in `rxrpc_clean_up_local_conns()`: ```c /* Also clean up any connections still in bundles (not yet idle). */ spin_lock(&local->client_bundles_lock); while (!RB_EMPTY_ROOT(&local->client_bundles)) { struct rxrpc_bundle *bundle = rb_entry( rb_first(&local->client_bundles), struct rxrpc_bundle, local_node); /* unbundle each slot */ for (int i = 0; i < RXRPC_MAX_CONNS_PER_CLIENT; i++) { conn = bundle->conns[i]; if (conn) { spin_unlock(&local->client_bundles_lock); rxrpc_unbundle_conn(conn); rxrpc_put_connection(conn, rxrpc_conn_put_local_dead); spin_lock(&local->client_bundles_lock); } } } spin_unlock(&local->client_bundles_lock); ``` The exact implementation should be reviewed by the rxrpc maintainer (David Howells), as additional locking considerations may apply. --- ## Bug Introduction The bug was introduced by commit [`9d35d880e0e4`](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9d35d880e0e4a3ab32d8c12f9e4d76198aadd42d) ("rxrpc: Move client call connection to the I/O thread", 2022-10-19). That commit moved connection allocation out of the app-thread sendmsg path and into the I/O thread, creating a new "allocated in bundle, not yet idle" state for connections. The existing `rxrpc_clean_up_local_conns()` function only handles the `idle_client_conns` list and was not updated to also cover the new state. | Field | Value | | ----- | ----- | | INTRODUCED-BY | [`9d35d880e0e4`](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9d35d880e0e4a3ab32d8c12f9e4d76198aadd42d) rxrpc: Move client call connection to the I/O thread | --- ## Upstream Fix Search Searched git history (`^kernel-6.18.13-0 origin/master -- net/rxrpc/`) for commits that fix the specific coverage gap in `rxrpc_clean_up_local_conns()`. No commit addressing the non-idle connection cleanup was found as of the search. The fix `fc9de52de38f` ("rxrpc: Fix missing locking causing hanging calls") is already present in `kernel-6.18.13-0` and addresses a different (though related) bug in the same code path. **No upstream fix was identified for this specific issue within the search budget.** --- ## Analysis, Conclusions, and Recommendations **Conclusion (high confidence):** The kernel BUG at `net/rxrpc/conn_client.c:64` is triggered when an AF_RXRPC client socket is closed while the I/O thread has already allocated a client connection for a pending call but that connection has not yet been activated on a channel (and therefore never appears on the `idle_client_conns` list). The `rxrpc_clean_up_local_conns()` function misses this connection, leaving it registered in the `conn_ids` IDR, which then trips the BUG assertion in `rxrpc_destroy_client_conn_ids()`. This is an **Unprivileged Application** crash: a regular user can trigger it by creating an `AF_RXRPC` socket and closing it rapidly while the I/O thread is mid-connection-setup. No root privileges are required; the `rxrpc_create()` path has no capability check. The `rxrpc` module must be loaded, which is the case on any system running the AFS client (kafs) or where the module has been manually loaded. **Recommendation:** The rxrpc maintainer (David Howells) should extend `rxrpc_clean_up_local_conns()` to also release connections that are stored in `bundle->conns[]` but have not yet appeared on `idle_client_conns`, ensuring `rxrpc_destroy_client_conn_ids()` always finds an empty IDR. The reproducer provided in the bug report reliably triggers the issue and can serve as a regression test. --- ## Security Note The Linux Kernel CVE team is likely to assign a CVE to this issue (**Unprivileged Application** crash, no upstream fix identified). --- ## HTML Report See [report.html](http://oops.fenrus.org/reports/lkml/CAPhRvkyZGKHRTBhV3P2PCCRxmRKGEvJQ0W5a9SMW3qwS2hp2Qw/report.html)