# Linux kernel crash report This crash was reported on LKML by Chris Arges on 2026-02-04. The LKML post contains the oops, a crash tool analysis, and disassembly. See [aYN3JC_Kdgw5G2Ik@861G6M3](https://lore.kernel.org/lkml/aYN3JC_Kdgw5G2Ik@861G6M3/). ## Key elements | Field | Value | Implication | | ----- | ----- | ----------- | | MSGID | `` | | | MSGID_URL | [aYN3JC_Kdgw5G2Ik@861G6M3](https://lore.kernel.org/lkml/aYN3JC_Kdgw5G2Ik@861G6M3/) | | | UNAME | `6.18.7-cloudflare-2026.1.15` | Custom Cloudflare kernel based on v6.18.7 stable | | BUILD | `#1 PREEMPT(voluntary)` | No `-Ubuntu`, no `.fcNN` suffix — custom kernel | | DISTRO | none (custom Cloudflare) | No distro-specific debug package available | | PROCESS | `journalctl` | Userspace journal reader triggering a page fault | | PID | 3666669 | | | CPU | 7 | | | CRASH_TYPE | `VM_BUG_ON_FOLIO` | mm-subsystem assertion; crash site is `mm/filemap.c:3519` | | BUG_CONDITION | `!folio_contains(folio, index)` | The folio at index 0x7652 does not contain the fault index 0x7653 | | TAINT | G, W, O | W=prior warning; O=out-of-tree modules loaded | | HARDWARE | Lenovo HR355M-V3-G12/HR355M_V3_HPM | | | BIOS | HR355M_V3.G.031 02/17/2025 | | | SOURCEDIR | `oops-workdir/linux` (checked out at tag `v6.18.7`) | Mainline v6.18.7; Cloudflare may carry patches, but line numbers are good approximations | | VMLINUX | not available | Custom kernel, no debug package; addr2line not possible | ## Kernel modules The `Modules linked in:` line is not present in the reported oops (likely truncated). The taint flag `O` confirms out-of-tree modules were loaded. | Module | Flags | Backtrace | Location | Flag Implication | | ------ | ----- | --------- | -------- | ---------------- | | *(module list not available in this report)* | | | | | ## Backtrace The RIP register value is `ffffffff94b3ace1`; offset `0xa61` = 2657; base = `ffffffff94b3ace1 − 0xa61 = ffffffff94b3a280`. Six "?" entries (`srso_alias_return_thunk`, `do_mmap`) are excluded because there are more than 2 high-confidence entries. | Address | Function | Offset | Size | Context | Module | Source location | | ------- | -------- | ------ | ---- | ------- | ------ | --------------- | | `ffffffff94b3ace1 (ffffffff94b3a280 + 0xa61)` | `filemap_fault` | `0xa61` | `0x1410` | Task | *(built-in)* | [mm/filemap.c:3519](#1-filemap_fault--crash-site-mmfilemapc3519) | | | `__do_fault` | `0x31` | `0xd0` | Task | *(built-in)* | [mm/memory.c:5282](#2-__do_fault--call-site-mmmemoryc5282) | | | `do_fault` | `0x2e6` | `0x710` | Task | *(built-in)* | [mm/memory.c:5849](#3-do_fault--call-site-mmmemoryc5849) | | | `__handle_mm_fault` | `0x7b3` | `0xe50` | Task | *(built-in)* | `mm/memory.c` | | | `handle_mm_fault` | `0xaa` | `0x2a0` | Task | *(built-in)* | `mm/memory.c` | | | `do_user_addr_fault` | `0x208` | `0x660` | Task | *(built-in)* | `arch/x86/mm/fault.c` | | | `exc_page_fault` | `0x77` | `0x170` | Task | *(built-in)* | `arch/x86/mm/fault.c` | | | `asm_exc_page_fault` | `0x26` | `0x30` | Task | *(built-in)* | `arch/x86/entry/entry_64.S` | ## CPU registers ``` RAX: 0000000000000043 RBX: ff25825437d792a8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ff2582406fb9c4c0 RBP: 0000000000007653 R08: 0000000000000000 R09: ff4ac5c342ccfb48 R10: ff2582986cc3ffa8 R11: 0000000000000003 R12: 0000000000000000 R13: ff258239e9fbf740 R14: ff25825437d79138 R15: ff4ac5c342ccfde8 RIP: ffffffff94b3ace1 RSP: ff4ac5c342ccfcb0 EFLAGS: 00010246 CR2: 00007efd7ec53a08 CR3: 00000021f5891005 CR4: 0000000000771ef0 ``` Interpretation from the crash tool analysis provided in the email: | Register | Likely variable | Corroboration | | -------- | --------------- | ------------- | | RBP | `index` = 0x7653 | `crash> p -x index` → `$10 = 0x7653` | | RBX | `mapping` = ff25825437d792a8 | `crash> p mapping` → `$2 = (struct address_space *) 0xff25825437d792a8` | | RDI | `folio` = ff2582406fb9c4c0 | `crash> struct folio.mapping 0xff2582406fb9c4c0` → `mapping = 0x0` | | R13 | `file` = ff258239e9fbf740 | `crash> files 3666669` — matches the journal file | | R14 | `inode` = ff25825437d79138 | `crash> struct inode.i_ino -x ff25825437d79138` → `i_ino = 0xc0000c0` (matches page dump) | Key observations: - **RBP=0x7653** = `index` — the page offset we are faulting on. - **RDI=folio, folio→mapping=NULL** — the folio retrieved from the page cache has had its mapping cleared. - The page dump says `index:0x7652` — the folio's own page-cache index is 0x7652, not 0x7653. Because the folio appears to be order-0 (single page), `folio_contains(folio, 0x7653)` returns false. - The check at line 3514 (`folio->mapping != mapping`) did NOT catch this: at the time of that check, `folio->mapping` matched `mapping`. The mapping was cleared by a concurrent thread between that check and the BUG assertion — a classic narrow race window. ## Code bytes ``` Code: 48 8b 4c 24 10 4c 8b 44 24 08 48 85 c9 0f 84 82 fa ff ff 49 89 cd e9 bc f9 ff ff 48 c7 c6 20 44 d0 96 4c 89 c7 e8 3f 1c 04 00 <0f> 0b 48 8d 7b 18 4c 89 44 24 08 4c 89 1c 24 e8 0b 97 e3 ff 4c 8b ``` `<0f> 0b` = `UD2` — the x86 undefined-instruction trap used by `BUG()` / `VM_BUG_ON_FOLIO()`. Crash type confirmed: BUG variant. The `dump_page` call appears after the UD2 in disassembly, confirming this is the compiler's outlined cold-path for the VM_BUG_ON_FOLIO block. ## Backtrace source code ### 1. `filemap_fault` — crash site (`mm/filemap.c:3519`) [mm/filemap.c at v6.18.7](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/filemap.c?h=v6.18.7#n3447) ```c 3447 vm_fault_t filemap_fault(struct vm_fault *vmf) 3448 { 3449 int error; 3450 struct file *file = vmf->vma->vm_file; 3451 struct file *fpin = NULL; 3452 struct address_space *mapping = file->f_mapping; 3453 struct inode *inode = mapping->host; 3454 pgoff_t max_idx, index = vmf->pgoff; ... // First attempt: is there already a folio in the page cache? 3468 folio = filemap_get_folio(mapping, index); 3469 if (likely(!IS_ERR(folio))) { ... // Fast path: folio found; async readahead, check uptodate ... 3490 retry_find: ... // Slow path: folio not in cache; create one 3499 folio = __filemap_get_folio(mapping, index, 3500 FGP_CREAT|FGP_FOR_MMAP, 3501 vmf->gfp_mask); ... 3510 if (!lock_folio_maybe_drop_mmap(vmf, folio, &fpin)) 3511 goto out_retry; 3512 3513 /* Did it get truncated? */ 3514 if (unlikely(folio->mapping != mapping)) { // ← check passes (race window opens here) 3515 folio_unlock(folio); 3516 folio_put(folio); 3517 goto retry_find; 3518 } 3519 VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio); // ← CRASH HERE ``` At the crash site: - `index` = 0x7653 (from RBP and `crash> p -x index`) - `folio` = `ff2582406fb9c4c0`, `folio->index` = 0x7652, `folio->mapping` = NULL - `folio_contains(folio, 0x7653)` = `0x7653 − 0x7652 < folio_nr_pages(folio)` = `1 < 1` = **false** → BUG fires The check at line 3514 should have caught a NULL mapping, but the mapping was cleared by a concurrent CPU between line 3514 and line 3519 (see How analysis). ### 2. `__do_fault` — call site (`mm/memory.c:5282`) [mm/memory.c at v6.18.7](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/memory.c?h=v6.18.7#n5254) ```c 5254 static vm_fault_t __do_fault(struct vm_fault *vmf) 5255 { 5256 struct vm_area_struct *vma = vmf->vma; 5257 struct folio *folio; 5258 vm_fault_t ret; ... 5282 ret = vma->vm_ops->fault(vmf); // ← call here → filemap_fault() ``` ### 3. `do_fault` — call site (`mm/memory.c:5849`) [mm/memory.c at v6.18.7](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/memory.c?h=v6.18.7#n5820) ```c 5820 static vm_fault_t do_fault(struct vm_fault *vmf) 5821 { 5822 struct vm_area_struct *vma = vmf->vma; 5823 struct vm_mm_struct *vm_mm = vma->vm_mm; 5824 vm_fault_t ret; ... // Read fault path (journalctl is reading, not writing): 5849 } else if (!(vmf->flags & FAULT_FLAG_WRITE)) 5850 ret = do_read_fault(vmf); // ← call here → __do_fault() ``` ## What-how-where analysis ### What **What happened:** In `filemap_fault()` at `mm/filemap.c:3519`, the assertion `VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio)` fired. The folio retrieved from the XFS page cache for file index 0x7653 has `folio->index = 0x7652` and `folio_nr_pages() = 1` (single page), so `folio_contains(folio, 0x7653)` evaluates to false. Additionally, `folio->mapping = NULL` at crash time, which means the folio was removed from the page cache between the truncation guard at line 3514 and the BUG assertion at line 3519 — a very narrow race window. The crash occurred while `journalctl` (PID 3666669) was reading a journal file from an XFS filesystem under memory pressure. The system was running a custom Cloudflare kernel `6.18.7-cloudflare-2026.1.15`. The reporter noted the same crash on both x86_64 and arm64 systems, confirming this is an architecture-independent software bug. ### How **Positive How — a race condition in `__split_unmapped_folio()`.** **Q1: How can `folio_contains(folio, 0x7653)` be false for a folio with `folio->index = 0x7652`?** A1: `folio_contains(folio, index)` is defined as: ```c return index - folio->index < folio_nr_pages(folio); // = 0x7653 - 0x7652 < folio_nr_pages(folio) // = 1 < folio_nr_pages(folio) ``` This can only be false if `folio_nr_pages(folio) <= 1`, i.e. the folio is a single-page (order-0) folio. Yet `filemap_get_folio(mapping, 0x7653)` returned this folio — it was returned because at the time of the XArray lookup, the folio was a **large folio** (order ≥ 1) starting at index 0x7652, which did contain 0x7653. By the time the assertion is checked, the folio appears single-page. This points to a **folio split** having occurred between the XArray lookup and the assertion. **Q2: How can a folio that covered 0x7652–0x7653 end up as a single-page folio at 0x7652?** A2: A concurrent thread was performing a **non-uniform THP split** of the large folio (e.g. triggered by XFS hole-punching or truncation under memory pressure). The bug is in `__split_unmapped_folio()` in `mm/huge_memory.c:3472`: [mm/huge_memory.c at v6.18.7](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.18.7#n3431) ```c 3431 static int __split_unmapped_folio(struct folio *folio, int new_order, 3432 struct page *split_at, struct xa_state *xas, 3433 struct address_space *mapping, bool uniform_split) 3434 { 3435 int order = folio_order(folio); 3436 int start_order = uniform_split ? new_order : order - 1; ... // Non-uniform split: iterates, splitting one order at a time // 'folio' variable is MODIFIED in each loop iteration 3451 for (split_order = start_order; 3452 split_order >= new_order && !stop_split; 3453 split_order--) { ... 3462 if (mapping) { 3463 /* 3464 * uniform split has xas_split_alloc() called before 3465 * irq is disabled to allocate enough memory, whereas 3466 * non-uniform split can handle ENOMEM. 3467 */ 3468 if (uniform_split) 3469 xas_split(xas, folio, old_order); 3470 else { 3471 xas_set_order(xas, folio->index, split_order); 3472 xas_try_split(xas, folio, old_order); // ← BUG: folio changes each iteration! 3473 if (xas_error(xas)) { 3474 ret = xas_error(xas); 3475 stop_split = true; 3476 } 3477 } 3478 } ``` During a **non-uniform split** (the loop splits one order at a time, e.g. order-3 → order-2 → order-1 → order-0), the variable `folio` is reassigned on each iteration to point to the sub-folio being further split. When `xas_try_split(xas, folio, old_order)` is called with the *intermediate* sub-folio (not the original large folio), the XArray places intermediate after-split folios at incorrect positions — before the final correct folios are installed. Concretely (as documented in the fix commit), the XArray momentarily looks like: ``` +----------------+---------+----+----+ | f | f2 | f3 | f3 | ← f3 at wrong slot! +----------------+---------+----+----+ ``` A parallel `filemap_get_entry()` (called from `filemap_get_folio()`) can grab this misplaced `f3` folio via `folio_try_get()` during the window when it is "unfrozen" but still at the wrong slot. The grabbed folio `f3` has `index = 0x7652` (sub-folio index within the original large folio) but appears single-page after the split, so it does not contain the original fault index `0x7653`. By the time the caller (filemap_fault) reaches line 3519, the split has completed and the folio's mapping may have been cleared — or the folio is simply incompatible with the requested index. **Q3: Why did the guard at `filemap_fault:3514` (`folio->mapping != mapping`) not prevent the crash?** A3: This guard checks whether the folio was truncated (mapping cleared), not whether it is at the correct XArray slot. During a folio split, the original large folio's mapping is not necessarily cleared immediately. The misplaced sub-folio `f3` has the same `mapping` pointer (it inherits it from the original large folio), so the guard at 3514 passes. The BUG at 3519 is the first check that catches the index mismatch. Note: `folio->mapping = NULL` seen in the post-crash kdump reflects the state *after* the split completed and cleanup ran; at the instant of the BUG, the mapping check passed, which is why we reached 3519. ### Where **Root fix — `mm/huge_memory.c:__split_unmapped_folio()`** The bug is fixed by commit [577a1f495fd7](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=577a1f495fd78d8fb61b67ac3d3b595b01f6fcb0) (*"mm/huge_memory: fix a folio_split() race condition with folio_try_get()"*, Zi Yan, 2026-03-02): The fix saves the **original** large folio in a local variable `old_folio` and always passes `old_folio` (not the loop variable `folio`) to `xas_try_split()`: ```diff + struct folio *old_folio = folio; ... - xas_try_split(xas, folio, old_order); + xas_try_split(xas, old_folio, old_order); ``` By using the original frozen folio in `xas_try_split`, parallel `folio_try_get()` callers see the original folio in the XArray and will wait on it (it is frozen) until the split is fully committed and the correct after-split folios are installed at their proper indices. This eliminates the window where misplaced folios are visible. **Status:** Commit [577a1f495fd7](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=577a1f495fd78d8fb61b67ac3d3b595b01f6fcb0) landed in v7.0-rc1 and carries `Cc: stable@vger.kernel.org`. It has **not** been backported to the v6.18.x stable series as of v6.18.12. The affected kernel `6.18.7-cloudflare-2026.1.15` is vulnerable. A companion test was also added: commit [224f1292615](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=224f1292615079d604651915a214f9e5ace9e41c) (*"selftests/mm: add folio_split() and filemap_get_entry() race test"*, Zi Yan, 2026-03-23). **Defensive fix:** The check at `mm/filemap.c:3514` could be extended to also verify `folio_contains(folio, index)` before asserting it, and retry via `goto retry_find` if it fails. However, this masks the underlying race rather than fixing it, and may silently hide data corruption (wrong folio read). The root fix in huge_memory.c is strongly preferred. **Note on data corruption:** The commit message for [577a1f495fd7](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=577a1f495fd78d8fb61b67ac3d3b595b01f6fcb0) explicitly notes that misplaced folios "returned to userspace at wrong file indices" cause **data corruption**, not just a crash. The BUG assertion in the reporter's case is actually the kernel *catching* the corruption before it propagates; on kernels without `CONFIG_DEBUG_VM`, the same race would silently return wrong data to userspace. ## Analysis, conclusions and recommendations **Root cause confirmed:** A race condition between `folio_try_get()` (used in `filemap_get_entry()`) and the non-uniform THP split path in `__split_unmapped_folio()` (`mm/huge_memory.c:3472`). During a non-uniform split, the XArray momentarily contains after-split folios at incorrect indices; a concurrent page-fault handler grabs one of these misplaced folios and subsequently triggers the `VM_BUG_ON_FOLIO(!folio_contains(folio, index))` assertion in `filemap_fault()`. **Confidence:** High. The upstream fix commit [577a1f495fd7](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=577a1f495fd78d8fb61b67ac3d3b595b01f6fcb0) matches the exact crash pattern (same assertion, same XFS/pagecache context, same large-folio split trigger), was written in response to a similar reproducer, and is tagged `Cc: stable@vger.kernel.org`. **Recommendations:** 1. **Backport commit [577a1f495fd7](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=577a1f495fd78d8fb61b67ac3d3b595b01f6fcb0) to the v6.18.x Cloudflare kernel.** This is a one-line change (`old_folio` variable) in `mm/huge_memory.c` and should apply cleanly to v6.18.7. 2. **Verify `folio_split()` is active in the Cloudflare kernel.** The non-uniform split path (`folio_split()`) is used by XFS truncation when large folios are enabled (`mapping_large_folio_support()`). If the Cloudflare kernel has large/huge folios for XFS disabled, the race cannot be triggered. 3. **Check kernel.org stable tree.** The commit carries `Cc: stable@vger.kernel.org` and should appear in a future `v6.18.x` stable release. Monitor https://cdn.kernel.org/pub/linux/kernel/v6.x/ for a `v6.18.13` or later that contains this fix. 4. **Note on silent corruption risk.** The reporter's kernel has `CONFIG_DEBUG_VM` enabled (VM_BUG_ON_FOLIO is active). Production kernels without `CONFIG_DEBUG_VM` would not crash but could silently serve wrong file data to `journalctl`. The BUG assertion is actually protective in this case.