# Linux kernel crash report

Syzbot report via email, MSGID [69ed492c.050a0220.e51af.0005.GAE@google.com](https://lore.kernel.org/r/69ed492c.050a0220.e51af.0005.GAE@google.com). Subject: `[syzbot] [bluetooth?] WARNING in hci_send_cmd (4)`. Dashboard: https://syzkaller.appspot.com/bug?extid=00f5a866124dc44cce14

> **Root cause**: `hci_send_cmd()` lacks an `HCI_UP` guard, allowing it to queue work onto `hdev->workqueue` after the workqueue has entered the draining/destroying state during HCI device shutdown.

## Key elements

| Field | Value | Implication |
| ----- | ----- | ----------- |
| CRASH_TYPE | WARNING | |
| WARNING_MSG | `workqueue: cannot queue hci_cmd_work on wq hci0` | |
| WARNING_SRC | `kernel/workqueue.c:2297` | WARN_ONCE in `__queue_work`: wq is being destroyed/drained |
| UNAME | `syzkaller #0 PREEMPT(full)` | |
| DISTRO | syzbot / mainline upstream | |
| COMPILER | Debian clang version 21.1.8 | |
| HEAD_COMMIT | `b4e07588e743` | |
| MSGID | `<69ed492c.050a0220.e51af.0005.GAE@google.com>` | |
| MSGID_URL | [69ed492c.050a0220.e51af.0005.GAE@google.com](https://lore.kernel.org/r/69ed492c.050a0220.e51af.0005.GAE@google.com) | |
| BUG_URL | https://syzkaller.appspot.com/bug?extid=00f5a866124dc44cce14 | |
| SYZBOT_OCCURRENCE | 4th occurrence | Recurring — not first time syzbot has seen this |
| INTRODUCED-BY | [c347b765fe70](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c347b765fe70d65ef0a2b1be8db91e5e84f73f4a) | `queue_work` on `hdev->workqueue` without HCI_UP guard (2011) |
| PROCESS | `kworker/0:3` (PID 1378, CPU 0) | Kernel worker thread |
| WQ_CONTEXT | `Workqueue: events l2cap_info_timeout` | Running in the `events` workqueue |
| HARDWARE | QEMU Standard PC (Q35 + ICH9, 2009) | syzbot VM |
| BIOS | 1.16.3-debian-1.16.3-2 04/01/2014 | |
| TAINT | Not tainted | No taint flags set |
| CONFIG_REQUIRED | `CONFIG_BUG` (unconditional when enabled — standard in all builds) | |
| VMLINUX | `oops-workdir/syzbot/vmlinux-b4e07588` | |
| SOURCEDIR | `oops-workdir/linux` | checked out to b4e07588e743 |


## Kernel modules

| Module | Flags | Backtrace | Location | Flag Implication |
| ------ | ----- | --------- | -------- | ---------------- |
| *(module list not available in this report)* | | | | |


## Backtrace

| Address | Function | Offset | Size | Context | Module | Source location |
| ------- | -------- | ------ | ---- | ------- | ------ | --------------- |
| `0xffffffff818d5d2a (0xffffffff818d4fe0 + 0xd4a)` | `__queue_work` | `0xd4a` | `0xfc0` | Task | *(built-in)* | [kernel/workqueue.c:2297](#1-__queue_work--crash-site-kernelworkqueuec2297) |
| `0xffffffff818d4f06 (0xffffffff818d4e00 + 0x106)` | `queue_work_on` | `0x106` | `0x1d0` | Task | *(built-in)* | [kernel/workqueue.c:2432](#2-queue_work_on-kernelworkqueuec2432) |
| | `queue_work` (inlined) | | | Task | *(built-in)* | include/linux/workqueue.h:696 |
| `0xffffffff8aaa3767 (0xffffffff8aaa36b0 + 0xb7)` | `hci_send_cmd` | `0xb7` | `0x1a0` | Task | *(built-in)* | [net/bluetooth/hci_core.c:3111](#3-hci_send_cmd-netbluetoothhci_corec3111) |
| | `hci_conn_auth` (inlined) | | | Task | *(built-in)* | net/bluetooth/hci_conn.c:2459 |
| `0xffffffff8aabbac9 (0xffffffff8aabb530 + 0x599)` | `hci_conn_security` | `0x599` | `0xa80` | Task | *(built-in)* | [net/bluetooth/hci_conn.c:2551](#4-hci_conn_security-netbluetoothhci_connc2551) |
| `0xffffffff8ab70f0c (0xffffffff8ab70b50 + 0x3bc)` | `l2cap_conn_start` | `0x3bc` | `0xf20` | Task | *(built-in)* | net/bluetooth/l2cap_core.c:1534 |
| `0xffffffff8ab707c8 (0xffffffff8ab70760 + 0x68)` | `l2cap_info_timeout` | `0x68` | `0xa0` | Task | *(built-in)* | net/bluetooth/l2cap_core.c:1685 |
| | `process_one_work` (inlined) | | | Task | *(built-in)* | kernel/workqueue.c:3302 |
| `0xffffffff818eb2ed (0xffffffff818ea790 + 0xb5d)` | `process_scheduled_works` | `0xb5d` | `0x1860` | Task | *(built-in)* | kernel/workqueue.c:3385 |
| `0xffffffff818f3353 (0xffffffff818f2900 + 0xa53)` | `worker_thread` | `0xa53` | `0xfc0` | Task | *(built-in)* | kernel/workqueue.c:3466 |
| `0xffffffff8190ae58 (0xffffffff8190aad0 + 0x388)` | `kthread` | `0x388` | `0x470` | Task | *(built-in)* | kernel/kthread.c:436 |
| `0xffffffff816bfae4 (0xffffffff816bf5d0 + 0x514)` | `ret_from_fork` | `0x514` | `0xb70` | Task | *(built-in)* | arch/x86/kernel/process.c:158 |
| `0xffffffff813370aa (0xffffffff81337090 + 0x1a)` | `ret_from_fork_asm` | `0x1a` | `0x30` | Task | *(built-in)* | arch/x86/entry/entry_64.S:245 |


## CPU Registers

```
RIP: 0010:__queue_work+0xd4a/0xfc0 kernel/workqueue.c:2296
RSP: 0018:ffffc9000257f720 EFLAGS: 00010082
RAX: 1ffff110081cc181  [KASAN shadow address]
RBX: 0000000000000008  [work flags / small integer]
RCX: ffff888000260000
RDX: ffff888040182170  [→ wq->name (R15), points to "hci0" string after offset add]
RSI: ffffffff8aa9ccd0  [= hci_cmd_work — work->func loaded from *R13]
RDI: ffffffff90368d70  [= __start___bug_table+0xd530 (R14), BUG table entry for WARN]
RBP: 0000000000000020
R08: ffff888040e60bf7
R09: 1ffff110081cc17e  [KASAN shadow address]
R10: dffffc0000000000  [KASAN shadow offset]
R11: ffffed10081cc17f  [KASAN address]
R12: dffffc0000000000  [KASAN shadow offset]
R13: ffff888040e60c08  [→ work struct; *R13 = work->func = hci_cmd_work]
R14: ffffffff90368d70  [= __start___bug_table+0xd530, BUG table entry]
R15: ffff888040182170  [→ wq->name after add $0x170 = "hci0"]
FS:  0000000000000000(0000)  GS:ffff88808c80c000(0000)  knlGS:0000000000000000
CS:  0010  DS: 0000  ES: 0000  CR0: 0000000080050033
CR2: 0000200000005dc0
CR3: 0000000012a31000  CR4: 0000000000352ef0
```

Code bytes at crash:
```
Code: 83 c5 18 4c 89 e8 48 c1 e8 03 42 80 3c 20 00 74 08 4c 89 ef e8 17 4d a5 00 49 8b 75 00
      49 81 c7 70 01 00 00 4c 89 f7 4c 89 fa <67> 48 0f b9 3a  ← trapping instruction (ud1)
      48 83 c4 58 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc
```

Disassembly window from vmlinux around crash site (`__queue_work+0xd4a`):
```
ffffffff818d5cec:  mov    0x18(%rsp),%r15
ffffffff818d5cf1:  jmp    ffffffff818d5cf8 <__queue_work+0xd18>
ffffffff818d5cf3:  call   ffffffff81c5e0f0 <__sanitizer_cov_trace_pc>
ffffffff818d5cf8:  lea    0xea93071(%rip),%r14   # ffffffff90368d70 <__start___bug_table+0xd530>
ffffffff818d5cff:  add    $0x18,%r13
ffffffff818d5d03:  mov    %r13,%rax
ffffffff818d5d06:  shr    $0x3,%rax
ffffffff818d5d0a:  cmpb   $0x0,(%rax,%r12,1)    ; KASAN shadow check
ffffffff818d5d0f:  je     ffffffff818d5d19 <__queue_work+0xd39>
ffffffff818d5d11:  mov    %r13,%rdi
ffffffff818d5d14:  call   ffffffff8232aa30 <__asan_report_load8_noabort>
ffffffff818d5d19:  mov    0x0(%r13),%rsi         ; RSI = work->func
ffffffff818d5d1d:  add    $0x170,%r15            ; R15 = wq->name
ffffffff818d5d24:  mov    %r14,%rdi              ; RDI = BUG table entry
ffffffff818d5d27:  mov    %r15,%rdx              ; RDX = wq->name
ffffffff818d5d2a:  call   ffffffff8bbd4e98 <__SCT__WARN_trap>   <<<< crash
ffffffff818d5d2f:  add    $0x58,%rsp
ffffffff818d5d33:  pop    %rbx
ffffffff818d5d34:  pop    %r12
ffffffff818d5d36:  pop    %r13
ffffffff818d5d38:  pop    %r14
```

Note: Code bytes show `ud1 (%edx),%rdi` (67 48 0f b9 3a) at the crash address, while vmlinux
disasm shows `call __SCT__WARN_trap`. The `ud1` encoding is the WARN trap mechanism; the call
target `__SCT__WARN_trap` contains the `ud1` as a static call stub.


## Backtrace source code

### 1. `__queue_work` — crash site (`kernel/workqueue.c:2297`)

[`kernel/workqueue.c` @ b4e07588e743](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/workqueue.c?id=b4e07588e743#l2275)

```c
2275  static void __queue_work(int cpu, struct workqueue_struct *wq,
2276                           struct work_struct *work)
2277  {
2278      struct pool_workqueue *pwq;
2279      struct worker_pool *last_pool, *pool;
2280      unsigned int work_flags;
2281      unsigned int req_cpu = cpu;
2282  
2283      /*
2284       * While a work item is PENDING && off queue, a task trying to
2285       * steal the PENDING will busy-loop waiting for it to either get
2286       * queued or lose PENDING.  Grabbing PENDING and queueing should
2287       * happen with IRQ disabled.
2288       */
2289      lockdep_assert_irqs_disabled();
2290  
2291      /*
2292       * For a draining wq, only works from the same workqueue are
2293       * allowed. The __WQ_DESTROYING helps to spot the issue that
2294       * queues a new work item to a wq after destroy_workqueue(wq).
2295       */
2296      if (unlikely(wq->flags & (__WQ_DESTROYING | __WQ_DRAINING) &&
2297  →              WARN_ONCE(!is_chained_work(wq), "workqueue: cannot queue %ps on wq %s\n",
2298                             work->func, wq->name))) {
2299          return;
2300      }
```

### 2. `queue_work_on` (`kernel/workqueue.c:2432`)

[`kernel/workqueue.c` @ b4e07588e743](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/workqueue.c?id=b4e07588e743#l2422)

```c
2422  bool queue_work_on(int cpu, struct workqueue_struct *wq,
2423                     struct work_struct *work)
2424  {
2425      bool ret = false;
2426      unsigned long irq_flags;
2427  
2428      local_irq_save(irq_flags);
2429  
2430      if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)) &&
2431          !clear_pending_if_disabled(work)) {
2432  →          __queue_work(cpu, wq, work);
2433          ret = true;
2434      }
2435  
2436      local_irq_restore(irq_flags);
2437      return ret;
2438  }
```

### 3. `hci_send_cmd` (`net/bluetooth/hci_core.c:3111`)

[`net/bluetooth/hci_core.c` @ b4e07588e743](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/bluetooth/hci_core.c?id=b4e07588e743#l3092)

```c
3092  int hci_send_cmd(struct hci_dev *hdev, __u16 opcode, __u32 plen,
3093                   const void *param)
3094  {
3095      struct sk_buff *skb;
3096  
3097      BT_DBG("%s opcode 0x%4.4x plen %d", hdev->name, opcode, plen);
3098  
3099      skb = hci_cmd_sync_alloc(hdev, opcode, plen, param, NULL);
3100      if (!skb) {
3101          bt_dev_err(hdev, "no memory for command");
3102          return -ENOMEM;
3103      }
3104  
3105      /* Stand-alone HCI commands must be flagged as
3106       * single-command requests.
3107       */
3108      bt_cb(skb)->hci.req_flags |= HCI_REQ_START;
3109  
3110      skb_queue_tail(&hdev->cmd_q, skb);
3111  →  queue_work(hdev->workqueue, &hdev->cmd_work);
3112  
3113      return 0;
3114  }
```

The inlined caller `hci_conn_auth` (net/bluetooth/hci_conn.c:2459) calls `hci_send_cmd` with
`HCI_OP_AUTH_REQUESTED`:

[`net/bluetooth/hci_conn.c` @ b4e07588e743](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/bluetooth/hci_conn.c?id=b4e07588e743#l2438)

```c
2438  static int hci_conn_auth(struct hci_conn *conn, __u8 sec_level, __u8 auth_type)
2439  {
2440      ...
2455      if (!test_and_set_bit(HCI_CONN_AUTH_PEND, &conn->flags)) {
2456          struct hci_cp_auth_requested cp;
2457  
2458          cp.handle = cpu_to_le16(conn->handle);
2459  →      hci_send_cmd(conn->hdev, HCI_OP_AUTH_REQUESTED,
2460                       sizeof(cp), &cp);
2461          ...
2462      }
2463      return 0;
2464  }
```

### 4. `hci_conn_security` (`net/bluetooth/hci_conn.c:2551`)

[`net/bluetooth/hci_conn.c` @ b4e07588e743](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/bluetooth/hci_conn.c?id=b4e07588e743#l2487)

```c
2487  int hci_conn_security(struct hci_conn *conn, __u8 sec_level, __u8 auth_type,
2488                        bool initiator)
2489  {
2490      ...
2544  auth:
2545      if (test_bit(HCI_CONN_ENCRYPT_PEND, &conn->flags))
2546          return 0;
2547  
2548      if (initiator)
2549          set_bit(HCI_CONN_AUTH_INITIATOR, &conn->flags);
2550  
2551  →  if (!hci_conn_auth(conn, sec_level, auth_type))
2552          return 0;
2553  
2554  encrypt:
2555      if (test_bit(HCI_CONN_ENCRYPT, &conn->flags)) {
2556          if (!conn->enc_key_size)
2557              return 0;
2558          return 1;
2559      }
2560  
2561      hci_conn_encrypt(conn);
2562      return 0;
2563  }
```

---

## What

`hci_send_cmd()` at `net/bluetooth/hci_core.c:3111` calls
`queue_work(hdev->workqueue, &hdev->cmd_work)` after `hdev->workqueue` has
already entered the draining or destroying state during HCI device shutdown.
The workqueue core's `WARN_ONCE` in `__queue_work()` at
`kernel/workqueue.c:2296–2298` detects this and fires, printing:

```
workqueue: cannot queue hci_cmd_work on wq hci0
```

The crashing instruction is a `ud1` emitted by the WARN trap mechanism.
The function being queued (`hci_cmd_work`) is confirmed by RSI =
`0xffffffff8aa9ccd0` = `hci_cmd_work`, loaded from the work struct pointer
in R13. R15 (+ 0x170 added just before the WARN) points to `hdev->name` =
`"hci0"`, confirmed in RDX at crash time.

`hci_send_cmd` does **not** check the `HCI_UP` flag in `hdev->flags` before
calling `queue_work`. All other latecomer entry points into the HCI device
(e.g. `hci_recv_frame`) guard with `!test_bit(HCI_UP, &hdev->flags)`; this
function is missing that guard.

---

## How

**Q1**: How does `queue_work(hdev->workqueue, …)` get called on a
draining/destroying workqueue?

**A1**: `hci_send_cmd()` calls `queue_work(hdev->workqueue, &hdev->cmd_work)`
unconditionally, without checking whether the device is still up. No caller
in the `l2cap_info_timeout → hci_conn_security → hci_conn_auth → hci_send_cmd`
chain checks this either.

**Q2**: How does `hdev->workqueue` end up draining while `hci_send_cmd` can
still be reached?

**A2**: Race condition in `hci_dev_close_sync()` (`net/bluetooth/hci_sync.c`):

1. **Line 5322** — `test_and_clear_bit(HCI_UP, &hdev->flags)` clears `HCI_UP`.
2. **Line 5353** — `drain_workqueue(hdev->workqueue)` is called, which sets
   `__WQ_DRAINING` on `hdev->workqueue`.
3. **Line 5368** — `hci_conn_hash_flush(hdev)` → `l2cap_conn_del()` →
   `disable_delayed_work_sync(&conn->info_timer)` — this would cancel
   `l2cap_info_timeout`, but it runs **after** step 2.

`l2cap_info_timeout` runs on the **`events`** workqueue, which is entirely
separate from `hdev->workqueue`. Draining `hdev->workqueue` does not
affect the `events` workqueue. If `l2cap_info_timeout` was already
scheduled on the `events` workqueue before step 3 cancels it, the callback
fires in the window between steps 2 and 3:

- `l2cap_info_timeout` → `l2cap_conn_start` → `hci_conn_security` →
  `hci_conn_auth` → `hci_send_cmd`
- `hci_send_cmd` calls `queue_work(hdev->workqueue, &hdev->cmd_work)`
- `hdev->workqueue` is in `__WQ_DRAINING` state → **WARN fires**

The same race could also occur at `destroy_workqueue(hdev->workqueue)`
(lines 2681, 2750) if the device is unregistered while `l2cap_info_timeout`
is in flight, giving `__WQ_DESTROYING` instead of `__WQ_DRAINING`.

---

## Where

**Fix location**: `hci_send_cmd()` in `net/bluetooth/hci_core.c`.

Add an `HCI_UP` guard at the top of `hci_send_cmd()`, matching the pattern
already used in `hci_recv_frame()` (line 2922) and the `hci_dev_ioctl()`
handler (line 596). Because `HCI_UP` is cleared **before**
`drain_workqueue()` in `hci_dev_close_sync()`, this guard catches the
race precisely.

No resource leaks are introduced: the guard fires before any allocation.
No lock imbalance: no locks are held at the guard point. No side effects on
callers: all call sites that handle the return value already tolerate
negative errno codes.

**Proposed fix** (`PATCH_BASE: b4e07588e743`):

```diff
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -3092,6 +3092,9 @@ int hci_send_cmd(struct hci_dev *hdev, __u16 opcode, __u32 plen,

 	BT_DBG("%s opcode 0x%4.4x plen %d", hdev->name, opcode, plen);

+	if (!test_bit(HCI_UP, &hdev->flags))
+		return -ENETDOWN;
+
 	skb = hci_cmd_sync_alloc(hdev, opcode, plen, param, NULL);
 	if (!skb) {
 		bt_dev_err(hdev, "no memory for command");
```

Diff review:
- No resource leak: returned before any allocation.
- No lock imbalance: no locks held at this point.
- No NULL/uninitialized dereference: `hdev` is always non-NULL at this point.
- No error-path coverage gap: `hci_cmd_sync_alloc` failure is still handled.
- Caller side effects: `hci_conn_auth` ignores the return value of
  `hci_send_cmd`; returning `-ENETDOWN` is safe. All other callers that
  propagate the error already handle negative errno.

### Patch

- **Status**: Succeeded
- **Base commit**: `b4e07588e743` (exact — PATCH_BASE matched HEAD)
- **Validation**: `git apply --check` passed cleanly (exact, no fuzz)
- **Output files**:
  - `patch-email.txt` — LKML-ready mbox patch email
  - `git-send-email.sh` — ready-to-run send script

---

## Bug Introduction

`queue_work(hdev->workqueue, &hdev->cmd_work)` at `hci_core.c:3111` has
been present without an `HCI_UP` guard since at least commit
[c347b765fe70](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c347b765fe70d65ef0a2b1be8db91e5e84f73f4a)
("Bluetooth: use module workqueue", Gustavo Padovan, 2011-12-14), which
introduced `queue_work` calls against `hdev->workqueue`. The `hci_send_cmd`
function itself dates to the original 2.6 import
([1da177e4c3f4](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1da177e4c3f41524e886b7f1b8a0c1fc7321cac2)).

This is a long-standing gap: the `HCI_UP` guard was added to many paths but
not to `hci_send_cmd`. The crash was first surfaced by syzbot fuzzing
L2CAP-over-Bluetooth teardown scenarios.

No upstream fix commit was identified within the search budget that
specifically adds the `HCI_UP` check to `hci_send_cmd`.

| Field | Value | Implication |
| ----- | ----- | ----------- |
| INTRODUCED-BY | [c347b765fe70](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c347b765fe70d65ef0a2b1be8db91e5e84f73f4a) | `queue_work` on `hdev->workqueue` added without HCI_UP guard |


## Fact Check

- **⚠️ Commit subject discrepancy**: c347b765fe70 actual subject is "Move command task to workqueue" (not "use module workqueue" as paraphrased in introduction).
- **⚠️ Line number offset**: Line 5353 reference should be line 5354 for `drain_workqueue(hdev->workqueue)` in `hci_dev_close_sync()`.

(Detailed factcheck log: see `factcheck.md` in the same directory)

---

## Patch Review

**Verdict: PASS**

All checklist items verified. No serious issues found. The patch correctly guards `hci_send_cmd()` against device shutdown by checking the `HCI_UP` flag before any allocation or queueing operation occurs. The guard precedes the first risky operation (line 3099: `hci_cmd_sync_alloc()`), preventing resource leaks. No lock imbalance or NULL dereference issues introduced. Error handling is correct—callers either ignore the return value or already tolerate negative errno codes; adding `-ENETDOWN` to the possible return set breaks no existing contracts. The fix matches the identical pattern already used in `hci_recv_frame()` (line 2922) and `hci_dev_ioctl()` (line 596). The guard placement is optimal because `HCI_UP` is cleared *before* `drain_workqueue()` in `hci_dev_close_sync()`, catching the race condition precisely at the vulnerability point.

## Fact Check

All checked items verified. Marker normalisation applied to align with reporting standards.