This crash was reported on LKML by Chris Arges carges@cloudflare.com on 2026-02-04. The LKML post contains the oops, a crash tool analysis, and disassembly. See aYN3JC_Kdgw5G2Ik@861G6M3.
| Field | Value | Implication |
|---|---|---|
| MSGID | <aYN3JC_Kdgw5G2Ik@861G6M3> |
|
| MSGID_URL | aYN3JC_Kdgw5G2Ik@861G6M3 | |
| UNAME | 6.18.7-cloudflare-2026.1.15 |
Custom Cloudflare kernel based on v6.18.7 stable |
| BUILD | #1 PREEMPT(voluntary) |
No -Ubuntu, no .fcNN suffix — custom
kernel |
| DISTRO | none (custom Cloudflare) | No distro-specific debug package available |
| PROCESS | journalctl |
Userspace journal reader triggering a page fault |
| PID | 3666669 | |
| CPU | 7 | |
| CRASH_TYPE | VM_BUG_ON_FOLIO |
mm-subsystem assertion; crash site is
mm/filemap.c:3519 |
| BUG_CONDITION | !folio_contains(folio, index) |
The folio at index 0x7652 does not contain the fault index 0x7653 |
| TAINT | G, W, O | W=prior warning; O=out-of-tree modules loaded |
| HARDWARE | Lenovo HR355M-V3-G12/HR355M_V3_HPM | |
| BIOS | HR355M_V3.G.031 02/17/2025 | |
| SOURCEDIR | oops-workdir/linux (checked out at tag
v6.18.7) |
Mainline v6.18.7; Cloudflare may carry patches, but line numbers are good approximations |
| VMLINUX | not available | Custom kernel, no debug package; addr2line not possible |
The Modules linked in: line is not present in the
reported oops (likely truncated). The taint flag O confirms
out-of-tree modules were loaded.
| Module | Flags | Backtrace | Location | Flag Implication |
|---|---|---|---|---|
| (module list not available in this report) |
The RIP register value is ffffffff94b3ace1; offset
0xa61 = 2657; base =
ffffffff94b3ace1 − 0xa61 = ffffffff94b3a280.
Six “?” entries (srso_alias_return_thunk,
do_mmap) are excluded because there are more than 2
high-confidence entries.
| Address | Function | Offset | Size | Context | Module | Source location |
|---|---|---|---|---|---|---|
ffffffff94b3ace1 (ffffffff94b3a280 + 0xa61) |
filemap_fault |
0xa61 |
0x1410 |
Task | (built-in) | mm/filemap.c:3519 |
__do_fault |
0x31 |
0xd0 |
Task | (built-in) | mm/memory.c:5282 | |
do_fault |
0x2e6 |
0x710 |
Task | (built-in) | mm/memory.c:5849 | |
__handle_mm_fault |
0x7b3 |
0xe50 |
Task | (built-in) | mm/memory.c |
|
handle_mm_fault |
0xaa |
0x2a0 |
Task | (built-in) | mm/memory.c |
|
do_user_addr_fault |
0x208 |
0x660 |
Task | (built-in) | arch/x86/mm/fault.c |
|
exc_page_fault |
0x77 |
0x170 |
Task | (built-in) | arch/x86/mm/fault.c |
|
asm_exc_page_fault |
0x26 |
0x30 |
Task | (built-in) | arch/x86/entry/entry_64.S |
RAX: 0000000000000043 RBX: ff25825437d792a8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ff2582406fb9c4c0
RBP: 0000000000007653 R08: 0000000000000000 R09: ff4ac5c342ccfb48
R10: ff2582986cc3ffa8 R11: 0000000000000003 R12: 0000000000000000
R13: ff258239e9fbf740 R14: ff25825437d79138 R15: ff4ac5c342ccfde8
RIP: ffffffff94b3ace1 RSP: ff4ac5c342ccfcb0
EFLAGS: 00010246
CR2: 00007efd7ec53a08 CR3: 00000021f5891005 CR4: 0000000000771ef0
Interpretation from the crash tool analysis provided in the email:
| Register | Likely variable | Corroboration |
|---|---|---|
| RBP | index = 0x7653 |
crash> p -x index → $10 = 0x7653 |
| RBX | mapping = ff25825437d792a8 |
crash> p mapping →
$2 = (struct address_space *) 0xff25825437d792a8 |
| RDI | folio = ff2582406fb9c4c0 |
crash> struct folio.mapping 0xff2582406fb9c4c0 →
mapping = 0x0 |
| R13 | file = ff258239e9fbf740 |
crash> files 3666669 — matches the journal file |
| R14 | inode = ff25825437d79138 |
crash> struct inode.i_ino -x ff25825437d79138 →
i_ino = 0xc0000c0 (matches page dump) |
Key observations: - RBP=0x7653 = index
— the page offset we are faulting on. - RDI=folio,
folio→mapping=NULL — the folio retrieved from the page cache
has had its mapping cleared. - The page dump says
index:0x7652 — the folio’s own page-cache index is 0x7652,
not 0x7653. Because the folio appears to be order-0 (single page),
folio_contains(folio, 0x7653) returns false. - The check at
line 3514 (folio->mapping != mapping) did NOT catch
this: at the time of that check, folio->mapping matched
mapping. The mapping was cleared by a concurrent thread
between that check and the BUG assertion — a classic narrow race
window.
Code: 48 8b 4c 24 10 4c 8b 44 24 08 48 85 c9 0f 84 82 fa ff ff 49 89 cd
e9 bc f9 ff ff 48 c7 c6 20 44 d0 96 4c 89 c7 e8 3f 1c 04 00
<0f> 0b
48 8d 7b 18 4c 89 44 24 08 4c 89 1c 24 e8 0b 97 e3 ff 4c 8b
<0f> 0b = UD2 — the x86
undefined-instruction trap used by BUG() /
VM_BUG_ON_FOLIO(). Crash type confirmed: BUG variant. The
dump_page call appears after the UD2 in disassembly,
confirming this is the compiler’s outlined cold-path for the
VM_BUG_ON_FOLIO block.
filemap_fault — crash site
(mm/filemap.c:3519)3447 vm_fault_t filemap_fault(struct vm_fault *vmf)
3448 {
3449 int error;
3450 struct file *file = vmf->vma->vm_file;
3451 struct file *fpin = NULL;
3452 struct address_space *mapping = file->f_mapping;
3453 struct inode *inode = mapping->host;
3454 pgoff_t max_idx, index = vmf->pgoff;
...
// First attempt: is there already a folio in the page cache?
3468 folio = filemap_get_folio(mapping, index);
3469 if (likely(!IS_ERR(folio))) {
...
// Fast path: folio found; async readahead, check uptodate
...
3490 retry_find:
...
// Slow path: folio not in cache; create one
3499 folio = __filemap_get_folio(mapping, index,
3500 FGP_CREAT|FGP_FOR_MMAP,
3501 vmf->gfp_mask);
...
3510 if (!lock_folio_maybe_drop_mmap(vmf, folio, &fpin))
3511 goto out_retry;
3512
3513 /* Did it get truncated? */
3514 if (unlikely(folio->mapping != mapping)) { // ← check passes (race window opens here)
3515 folio_unlock(folio);
3516 folio_put(folio);
3517 goto retry_find;
3518 }
3519 VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio); // ← CRASH HEREAt the crash site: - index = 0x7653 (from RBP and
crash> p -x index) - folio =
ff2582406fb9c4c0, folio->index = 0x7652,
folio->mapping = NULL -
folio_contains(folio, 0x7653) =
0x7653 − 0x7652 < folio_nr_pages(folio) =
1 < 1 = false → BUG fires
The check at line 3514 should have caught a NULL mapping, but the mapping was cleared by a concurrent CPU between line 3514 and line 3519 (see How analysis).
__do_fault —
call site (mm/memory.c:5282)5254 static vm_fault_t __do_fault(struct vm_fault *vmf)
5255 {
5256 struct vm_area_struct *vma = vmf->vma;
5257 struct folio *folio;
5258 vm_fault_t ret;
...
5282 ret = vma->vm_ops->fault(vmf); // ← call here → filemap_fault()do_fault —
call site (mm/memory.c:5849)5820 static vm_fault_t do_fault(struct vm_fault *vmf)
5821 {
5822 struct vm_area_struct *vma = vmf->vma;
5823 struct vm_mm_struct *vm_mm = vma->vm_mm;
5824 vm_fault_t ret;
...
// Read fault path (journalctl is reading, not writing):
5849 } else if (!(vmf->flags & FAULT_FLAG_WRITE))
5850 ret = do_read_fault(vmf); // ← call here → __do_fault()What happened: In filemap_fault() at
mm/filemap.c:3519, the assertion
VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio)
fired. The folio retrieved from the XFS page cache for file index 0x7653
has folio->index = 0x7652 and
folio_nr_pages() = 1 (single page), so
folio_contains(folio, 0x7653) evaluates to false.
Additionally, folio->mapping = NULL at crash time,
which means the folio was removed from the page cache between the
truncation guard at line 3514 and the BUG assertion at line 3519 — a
very narrow race window.
The crash occurred while journalctl (PID 3666669) was
reading a journal file from an XFS filesystem under memory pressure. The
system was running a custom Cloudflare kernel
6.18.7-cloudflare-2026.1.15. The reporter noted the same
crash on both x86_64 and arm64 systems, confirming this is an
architecture-independent software bug.
Positive How — a race condition in
__split_unmapped_folio().
Q1: How can folio_contains(folio, 0x7653) be
false for a folio with
folio->index = 0x7652?
A1: folio_contains(folio, index) is defined as:
return index - folio->index < folio_nr_pages(folio);
// = 0x7653 - 0x7652 < folio_nr_pages(folio)
// = 1 < folio_nr_pages(folio)This can only be false if folio_nr_pages(folio) <= 1,
i.e. the folio is a single-page (order-0) folio. Yet
filemap_get_folio(mapping, 0x7653) returned this folio — it
was returned because at the time of the XArray lookup, the folio was a
large folio (order ≥ 1) starting at index 0x7652, which
did contain 0x7653. By the time the assertion is checked, the folio
appears single-page. This points to a folio split
having occurred between the XArray lookup and the assertion.
Q2: How can a folio that covered 0x7652–0x7653 end up as a single-page folio at 0x7652?
A2: A concurrent thread was performing a non-uniform THP
split of the large folio (e.g. triggered by XFS hole-punching
or truncation under memory pressure). The bug is in
__split_unmapped_folio() in
mm/huge_memory.c:3472:
3431 static int __split_unmapped_folio(struct folio *folio, int new_order,
3432 struct page *split_at, struct xa_state *xas,
3433 struct address_space *mapping, bool uniform_split)
3434 {
3435 int order = folio_order(folio);
3436 int start_order = uniform_split ? new_order : order - 1;
...
// Non-uniform split: iterates, splitting one order at a time
// 'folio' variable is MODIFIED in each loop iteration
3451 for (split_order = start_order;
3452 split_order >= new_order && !stop_split;
3453 split_order--) {
...
3462 if (mapping) {
3463 /*
3464 * uniform split has xas_split_alloc() called before
3465 * irq is disabled to allocate enough memory, whereas
3466 * non-uniform split can handle ENOMEM.
3467 */
3468 if (uniform_split)
3469 xas_split(xas, folio, old_order);
3470 else {
3471 xas_set_order(xas, folio->index, split_order);
3472 xas_try_split(xas, folio, old_order); // ← BUG: folio changes each iteration!
3473 if (xas_error(xas)) {
3474 ret = xas_error(xas);
3475 stop_split = true;
3476 }
3477 }
3478 }During a non-uniform split (the loop splits one
order at a time, e.g. order-3 → order-2 → order-1 → order-0), the
variable folio is reassigned on each iteration to point to
the sub-folio being further split. When
xas_try_split(xas, folio, old_order) is called with the
intermediate sub-folio (not the original large folio), the
XArray places intermediate after-split folios at incorrect positions —
before the final correct folios are installed.
Concretely (as documented in the fix commit), the XArray momentarily looks like:
+----------------+---------+----+----+
| f | f2 | f3 | f3 | ← f3 at wrong slot!
+----------------+---------+----+----+
A parallel filemap_get_entry() (called from
filemap_get_folio()) can grab this misplaced
f3 folio via folio_try_get() during the window
when it is “unfrozen” but still at the wrong slot. The grabbed folio
f3 has index = 0x7652 (sub-folio index within
the original large folio) but appears single-page after the split, so it
does not contain the original fault index 0x7653. By the
time the caller (filemap_fault) reaches line 3519, the split has
completed and the folio’s mapping may have been cleared — or the folio
is simply incompatible with the requested index.
Q3: Why did the guard at filemap_fault:3514
(folio->mapping != mapping) not prevent the
crash?
A3: This guard checks whether the folio was truncated (mapping
cleared), not whether it is at the correct XArray slot. During a folio
split, the original large folio’s mapping is not necessarily cleared
immediately. The misplaced sub-folio f3 has the same
mapping pointer (it inherits it from the original large
folio), so the guard at 3514 passes. The BUG at 3519 is the first check
that catches the index mismatch.
Note: folio->mapping = NULL seen in the post-crash
kdump reflects the state after the split completed and cleanup
ran; at the instant of the BUG, the mapping check passed, which is why
we reached 3519.
Root fix —
mm/huge_memory.c:__split_unmapped_folio()
The bug is fixed by commit 577a1f495fd7 (“mm/huge_memory: fix a folio_split() race condition with folio_try_get()”, Zi Yan, 2026-03-02):
The fix saves the original large folio in a local
variable old_folio and always passes old_folio
(not the loop variable folio) to
xas_try_split():
+ struct folio *old_folio = folio;
...
- xas_try_split(xas, folio, old_order);
+ xas_try_split(xas, old_folio, old_order);By using the original frozen folio in xas_try_split,
parallel folio_try_get() callers see the original folio in
the XArray and will wait on it (it is frozen) until the split is fully
committed and the correct after-split folios are installed at their
proper indices. This eliminates the window where misplaced folios are
visible.
Status: Commit 577a1f495fd7
landed in v7.0-rc1 and carries Cc: stable@vger.kernel.org.
It has not been backported to the v6.18.x stable series
as of v6.18.12. The affected kernel
6.18.7-cloudflare-2026.1.15 is vulnerable.
A companion test was also added: commit 224f1292615 (“selftests/mm: add folio_split() and filemap_get_entry() race test”, Zi Yan, 2026-03-23).
Defensive fix: The check at
mm/filemap.c:3514 could be extended to also verify
folio_contains(folio, index) before asserting it, and retry
via goto retry_find if it fails. However, this masks the
underlying race rather than fixing it, and may silently hide data
corruption (wrong folio read). The root fix in huge_memory.c is strongly
preferred.
Note on data corruption: The commit message for 577a1f495fd7
explicitly notes that misplaced folios “returned to userspace at wrong
file indices” cause data corruption, not just a crash.
The BUG assertion in the reporter’s case is actually the kernel
catching the corruption before it propagates; on kernels
without CONFIG_DEBUG_VM, the same race would silently
return wrong data to userspace.
Root cause confirmed: A race condition between
folio_try_get() (used in filemap_get_entry())
and the non-uniform THP split path in
__split_unmapped_folio()
(mm/huge_memory.c:3472). During a non-uniform split, the
XArray momentarily contains after-split folios at incorrect indices; a
concurrent page-fault handler grabs one of these misplaced folios and
subsequently triggers the
VM_BUG_ON_FOLIO(!folio_contains(folio, index)) assertion in
filemap_fault().
Confidence: High. The upstream fix commit 577a1f495fd7
matches the exact crash pattern (same assertion, same XFS/pagecache
context, same large-folio split trigger), was written in response to a
similar reproducer, and is tagged
Cc: stable@vger.kernel.org.
Recommendations:
Backport commit 577a1f495fd7
to the v6.18.x Cloudflare kernel. This is a one-line change
(old_folio variable) in mm/huge_memory.c and
should apply cleanly to v6.18.7.
Verify folio_split() is active in the
Cloudflare kernel. The non-uniform split path
(folio_split()) is used by XFS truncation when large folios
are enabled (mapping_large_folio_support()). If the
Cloudflare kernel has large/huge folios for XFS disabled, the race
cannot be triggered.
Check kernel.org stable tree. The commit carries
Cc: stable@vger.kernel.org and should appear in a future
v6.18.x stable release. Monitor
https://cdn.kernel.org/pub/linux/kernel/v6.x/ for a
v6.18.13 or later that contains this fix.
Note on silent corruption risk. The reporter’s
kernel has CONFIG_DEBUG_VM enabled (VM_BUG_ON_FOLIO is
active). Production kernels without CONFIG_DEBUG_VM would
not crash but could silently serve wrong file data to
journalctl. The BUG assertion is actually protective in
this case.