# Linux kernel crash report Source: kernel Bugzilla bug [#221376](https://bugzilla.kernel.org/show_bug.cgi?id=221376) — "AMD RADEON RX 9070 XT - modprobe amdgpu is fail." ## Key elements | Field | Value | Implication | | ----- | ----- | ----------- | | BUG_URL | [bugzilla.kernel.org #221376](https://bugzilla.kernel.org/show_bug.cgi?id=221376) | | | UNAME | `6.18.22-x86_64` | Custom kernel compiled by the reporter (build string: `#1 SMP PREEMPT_DYNAMIC Fri Apr 17 01:05:45 +07 2026`) | | DISTRO | *(none — custom kernel)* | No distro debug packages available; vmlinux not obtainable | | HARDWARE | GIGABYTE MD72-HB1-00 (dual-socket Intel Xeon Silver 4310 @ 2.10 GHz, 48 threads) | | | BIOS | F40 12/09/2025 | | | PROCESS | `kworker/12:1` | Kernel worker thread on CPU 12 | | WORKQUEUE | `events work_for_cpu_fn` | Device probe via `local_pci_probe` run on a specific CPU | | TAINT | *(Not tainted)* | Clean kernel; no proprietary or out-of-tree code | | CRASH_TYPE | BUG/BUG_ON | `DRM_MM_BUG_ON(start + size <= start)` in `drm_mm_init()` | | CRASH_SITE | `drivers/gpu/drm/drm_mm.c:930` | Assertion inside the DRM memory-range allocator initialiser | | SOURCEDIR | `/sdb1/arjan/git/oops-skill/oops-workdir/linux` (tag `v6.18.16`) | Nearest stable tag; 6.18.22 not present — analysis is approximate | | VMLINUX | *(not available — custom kernel)* | `addr2line` mapping not possible; source used directly | ## Kernel modules | Module | Flags | Backtrace | Location | Flag Implication | | ------ | ----- | --------- | -------- | ---------------- | | amdgpu | + | Y | | Being loaded at time of crash (`module_init` path likely involved) | | amdxcp | | | | | | drm\_ttm\_helper | | | | | | ttm | | Y | | | | drm\_exec | | | | | | drm\_panel\_backlight\_quirks | | | | | | gpu\_sched | | | | | | drm\_suballoc\_helper | | | | | | drm\_buddy | | | | | | drm\_display\_helper | | | | | | cec | | | | | | rc\_core | | | | | | igb | | | | | | i2c\_algo\_bit | | | | | ## Backtrace | Address | Function | Offset | Size | Context | Module | Source location | | ------- | -------- | ------ | ---- | ------- | ------ | --------------- | | *(RIP)* | `drm_mm_init` | `0xc1` | `0xd0` | Task | *(built-in)* | [drm\_mm.c:930](#1-drm_mm_init--crash-site-drm_mmc930) | | | `ttm_range_man_init_nocheck` | `0x9d` | `0x180` | Task | `ttm` (build-ID: e3a55dbbe0be) | [ttm\_range\_manager.c:198](#2-ttm_range_man_init_nocheck--call-site-ttm_range_managerc198) | | | `amdgpu_ttm_init.cold` | `0x45e` | `0x5cc` | Task | `amdgpu` (build-ID: 48030e986eac) | [amdgpu\_ttm.c:2103](#3-amdgpu_ttm_init--call-site-amdgpu_ttmc2103) | | | `amdgpu_bo_init.cold` | `0x5e` | `0x77` | Task | `amdgpu` | `amdgpu_object.c:1088` | | | `gmc_v12_0_sw_init` | `0x470` | `0x6f0` | Task | `amdgpu` | `gmc_v12_0.c:847` | | | `amdgpu_device_ip_init` | `0x8f` | `0xb43` | Task | `amdgpu` | | | | `amdgpu_device_init.cold` | `0x1495` | `0x1abe` | Task | `amdgpu` | | | | `amdgpu_driver_load_kms` | `0x1a` | `0x80` | Task | `amdgpu` | | | | `amdgpu_pci_probe` | `0x28e` | `0x760` | Task | `amdgpu` | | | | `local_pci_probe` | `0x51` | `0xc0` | Task | *(built-in)* | | | | `work_for_cpu_fn` | `0x1d` | `0x30` | Task | *(built-in)* | | | | `process_scheduled_works` | `0x2bc` | `0x680` | Task | *(built-in)* | | | | `worker_thread` | `0x1a6` | `0x4a0` | Task | *(built-in)* | | | | `kthread` | `0x1a4` | `0x3a0` | Task | *(built-in)* | | | | `ret_from_fork` | `0x1f8` | `0x3b0` | Task | *(built-in)* | | | | `ret_from_fork_asm` | `0x1a` | `0x30` | Task | *(built-in)* | | ## CPU registers | Register | Value | Note | | -------- | ----- | ---- | | RIP | `drm_mm_init+0xc1/0xd0` | Points to `ud2` (BUG) instruction at crash | | RSP | `ffa000000d723b60` | Valid kernel stack address | | RAX | `0000000000000000` | Zero | | RBX | `ff11004093c27400` | Pointer to `struct drm_mm` (`rman->mm`) | | RCX | `000000000000001c` | | | RDX | `0000000000000000` | **`size` arg to `drm_mm_init` = 0** — this is the bad value | | RSI | `0000000000000000` | **`start` arg to `drm_mm_init` = 0** | | RDI | `ff11004093c27480` | **`mm` arg to `drm_mm_init`** | | RBP | `0000000000000003` | | | R08 | `0000000000000dc0` | | | R09 | `00000000ffffffff` | | | R10 | `ff11004093c27400` | | | R11 | `0000000000000100` | | | R12 | `ff1100408b60f048` | | | R13 | `0000000000000000` | | | R14 | `0000000000000000` | | | R15 | `000000000000000b` | | | CR2 | `00007f15ed7d9f55` | Page-fault address (from a previous unrelated fault) | | CR4 | `0000000000771ef0` | | | EFLAGS | `00010246` | ZF=1 (zero flag set — result of the failed comparison) | **Key observation:** RSI = 0 (start) and RDX = 0 (size) confirm that `drm_mm_init` was called with `start=0, size=0`. The assertion `start + size <= start` → `0 + 0 ≤ 0` → `true` triggers the BUG. ## Code bytes ``` Code: 83 05 c2 c5 11 04 01 48 c7 83 f0 00 00 00 00 00 00 00 e8 a2 79 cd ff 48 83 05 b2 c5 11 04 01 5b c3 cc cc cc cc 0f 1f 40 00 90 <0f> 0b ← ud2 (BUG()) at drm_mm_init+0xc1 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 ``` The `ud2` at `<0f> 0b` is placed at the end of `drm_mm_init` (offset 0xc1 in a 0xd0-byte function). The compiler's normal path returns via `ret` (c3) earlier; the BUG path jumps here. This is the classic Linux BUG() layout. ## Backtrace source code ### 1. `drm_mm_init` — crash site (`drm_mm.c:930`) [`drivers/gpu/drm/drm_mm.c` at v6.18.16](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/gpu/drm/drm_mm.c?h=v6.18.16#n920) ```c 920 /** 921 * drm_mm_init - initialize a drm-mm allocator 922 * @mm: the drm_mm structure to initialize 923 * @start: start of the range managed by @mm 924 * @size: end of the range managed by @mm 925 * 926 * Note that @mm must be cleared to 0 before calling this function. 927 */ 928 void drm_mm_init(struct drm_mm *mm, u64 start, u64 size) 929 { 930 DRM_MM_BUG_ON(start + size <= start); // ← CRASH HERE (size == 0, start == 0 → 0 ≤ 0) 931 932 mm->color_adjust = NULL; ... 950 } ``` ### 2. `ttm_range_man_init_nocheck` — call site (`ttm_range_manager.c:198`) [`drivers/gpu/drm/ttm/ttm_range_manager.c` at v6.18.16](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/gpu/drm/ttm/ttm_range_manager.c?h=v6.18.16#n180) ```c 180 int ttm_range_man_init_nocheck(struct ttm_device *bdev, 181 unsigned type, bool use_tt, 182 unsigned long p_size) 183 { 184 struct ttm_resource_manager *man; 185 struct ttm_range_manager *rman; 186 187 rman = kzalloc(sizeof(*rman), GFP_KERNEL); 188 if (!rman) 189 return -ENOMEM; 190 191 man = &rman->manager; 192 man->use_tt = use_tt; 193 man->func = &ttm_range_manager_func; 194 195 ttm_resource_manager_init(man, bdev, p_size); 196 197 drm_mm_init(&rman->mm, 0, p_size); // ← call here; p_size == 0 when GDS/GWS/OA absent // ← RSI = 0 (start), RDX = 0 (p_size) 198 spin_lock_init(&rman->lock); 199 200 ttm_set_driver_manager(bdev, type, &rman->manager); 201 ttm_resource_manager_set_used(man, true); 202 return 0; 203 } ``` ### 3. `amdgpu_ttm_init` — call site (`amdgpu_ttm.c:2103`) [`drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c` at v6.18.16](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c?h=v6.18.16#n2100) ```c 2095 /* Initialize various on-chip memory pools */ 2103 r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_GDS, adev->gds.gds_size); // ← call here; gds_size == 0 2104 if (r) { 2105 dev_err(adev->dev, "Failed initializing GDS heap.\n"); 2106 return r; 2107 } 2108 2109 r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_GWS, adev->gds.gws_size); // ← same issue 2110 if (r) { 2111 dev_err(adev->dev, "Failed initializing gws heap.\n"); 2112 return r; 2113 } 2114 2115 r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_OA, adev->gds.oa_size); // ← same issue 2116 if (r) { 2117 dev_err(adev->dev, "Failed initializing oa heap.\n"); 2118 return r; 2119 } ``` `amdgpu_ttm_init_on_chip` (lines 74–80) is a one-liner: [`drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c` at v6.18.16](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c?h=v6.18.16#n74) ```c 74 static int amdgpu_ttm_init_on_chip(struct amdgpu_device *adev, 75 unsigned int type, 76 uint64_t size_in_page) 77 { 78 return ttm_range_man_init(&adev->mman.bdev, type, 79 false, size_in_page); // ← no guard for size_in_page == 0 80 } ``` `gfx_v11_0` (and all older GFX generations) populate `adev->gds.gds_size` during `gfx_v11_0_set_gds_init()` (line 7399–7409 of `gfx_v11_0.c`): ```c 7399 static void gfx_v11_0_set_gds_init(struct amdgpu_device *adev) 7400 { 7401 unsigned total_cu = ...; 7402 adev->gds.gds_size = 0x1000; // ← non-zero for GFX11 and earlier 7403 adev->gds.gds_compute_max_wave_id = total_cu * 32 - 1; 7404 adev->gds.gws_size = 64; 7405 adev->gds.oa_size = 16; 7406 } ``` `gfx_v12_0.c` has **no equivalent function** — GDS/GWS/OA were removed in RDNA4 (GFX 12). `adev->gds.{gds,gws,oa}_size` are never set and remain zero. ## What-how-where analysis ### What `drm_mm_init()` fires a `DRM_MM_BUG_ON` assertion at `drm_mm.c:930`: ```c DRM_MM_BUG_ON(start + size <= start); ``` The assertion is triggered because both `start` and `size` are **0**: `0 + 0 = 0 ≤ 0` → condition is true → BUG fires. This is confirmed by the register dump: RSI = 0 (`start`) and RDX = 0 (`size`), which are the second and third arguments to `drm_mm_init(mm, start, size)` on x86-64 System V ABI. The assertion is deliberately strict: the DRM memory-range manager requires a non-empty range. Passing `size=0` is a programming error on the caller's side. ### How **Q1: Why was `drm_mm_init` called with `size = 0`?** A1: `ttm_range_man_init_nocheck` passes its `p_size` parameter directly to `drm_mm_init` as the `size` argument (line 197–198 of `ttm_range_manager.c`). `p_size` was 0 — no check is made for this case. **Q2: Why was `ttm_range_man_init_nocheck` called with `p_size = 0`?** A2: `amdgpu_ttm_init_on_chip` (line 78–79 of `amdgpu_ttm.c`) calls `ttm_range_man_init` with `size_in_page` as-is. It was passed `adev->gds.gds_size`. **Q3: Why is `adev->gds.gds_size == 0`?** A3: The AMD GFX 12 / RDNA4 architecture (used by the RX 9070 XT) does **not have GDS (Global Data Share), GWS (Global Wave Sync), or OA (On-chip Accumulator)** hardware. These resources were present on GFX 11 and earlier. `gfx_v11_0.c` explicitly initialises them in `gfx_v11_0_set_gds_init()`. `gfx_v12_0.c` has no such function — `adev->gds.{gds,gws,oa}_size` are left at their zero-initialised default values. **Root cause (Negative How):** The `amdgpu_ttm_init()` code that initialises GDS/GWS/OA TTM memory pools at lines 2103–2119 of `amdgpu_ttm.c` has existed since before GFX 12 was introduced. It correctly assumes non-zero sizes because all previous architectures have these resources. When GFX 12 support was added to `gfx_v12_0.c`, no equivalent of `gfx_v11_0_set_gds_init()` was added (correctly, since the hardware no longer has GDS/GWS/OA), but the code in `amdgpu_ttm_init()` was not updated to guard against zero sizes. ### Where **The bug:** `amdgpu_ttm_init_on_chip` does not guard against `size_in_page == 0`. All callers that pass GDS/GWS/OA sizes will pass 0 for RDNA4 and any future architecture that lacks these resources. **Preferred fix location:** `amdgpu_ttm_init_on_chip()` in `amdgpu_ttm.c`. Adding an early return for zero size is the most defensive approach and protects all three call sites (GDS, GWS, OA) at once. Proposed fix (diff form): ```diff --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -74,6 +74,9 @@ static int amdgpu_ttm_init_on_chip(struct amdgpu_device *adev, unsigned int type, uint64_t size_in_page) { + if (!size_in_page) + return 0; + return ttm_range_man_init(&adev->mman.bdev, type, false, size_in_page); } ``` An alternative (slightly more explicit) fix is to guard the three call sites individually inside `amdgpu_ttm_init()`: ```diff --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2100,18 +2100,24 @@ int amdgpu_ttm_init(struct amdgpu_device *adev) /* Initialize various on-chip memory pools */ - r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_GDS, adev->gds.gds_size); - if (r) { - dev_err(adev->dev, "Failed initializing GDS heap.\n"); - return r; + if (adev->gds.gds_size) { + r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_GDS, adev->gds.gds_size); + if (r) { + dev_err(adev->dev, "Failed initializing GDS heap.\n"); + return r; + } } - r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_GWS, adev->gds.gws_size); - if (r) { - dev_err(adev->dev, "Failed initializing gws heap.\n"); - return r; + if (adev->gds.gws_size) { + r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_GWS, adev->gds.gws_size); + if (r) { + dev_err(adev->dev, "Failed initializing gws heap.\n"); + return r; + } } - r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_OA, adev->gds.oa_size); - if (r) { - dev_err(adev->dev, "Failed initializing oa heap.\n"); - return r; + if (adev->gds.oa_size) { + r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_OA, adev->gds.oa_size); + if (r) { + dev_err(adev->dev, "Failed initializing oa heap.\n"); + return r; + } } ``` The first (single-function) patch is preferred because it is smaller and protects any future callers of `amdgpu_ttm_init_on_chip` that might also receive a zero size. ## Bug introduction The bug was introduced in two stages: 1. The GDS/GWS/OA TTM pool initialisation code was added to `amdgpu_ttm_init()` in commit [`473633540c2f`](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=473633540c2f51) (Christian König, 2020-07-23). At the time, all supported architectures had GDS and the code was correct. 2. RDNA4 / GFX 12 support was added to `gfx_v12_0.c` without a `gfx_v12_0_set_gds_init()` equivalent — correctly, since the hardware removed GDS/GWS/OA — but the `amdgpu_ttm_init()` caller was not updated to handle zero sizes. The first commit touching `gmc_v12_0.c` that brought RDNA4 support pre-dates v6.18.16 (not individually identified within budget). The primary bug introduction is the **absence of a zero-size guard** in `amdgpu_ttm_init_on_chip()` combined with the legitimate non-initialisation of `adev->gds.*_size` for GFX 12. Bug introduction commit not identified with full precision within search budget; the RDNA4 bring-up series predates v6.18.16 and is the effective origin. ## Analysis, conclusions and recommendations **Summary:** Loading the `amdgpu` module for an AMD RX 9070 XT (RDNA4 / GFX 12) immediately crashes the kernel with a BUG assertion in `drm_mm_init()`. The root cause is that `amdgpu_ttm_init()` unconditionally tries to create TTM memory pools for GDS, GWS, and OA on-chip resources, passing `size=0` to the DRM range allocator, which rejects a zero-sized range as invalid. RDNA4 removed GDS/GWS/OA from the hardware; `gfx_v12_0.c` correctly leaves those size fields at zero. The missing piece is a zero-size guard in `amdgpu_ttm_init_on_chip()`. **Recommendation for the reporter:** Apply the one-liner fix to `amdgpu_ttm.c`: ```c static int amdgpu_ttm_init_on_chip(struct amdgpu_device *adev, unsigned int type, uint64_t size_in_page) { if (!size_in_page) // ← add this guard return 0; return ttm_range_man_init(&adev->mman.bdev, type, false, size_in_page); } ``` **No upstream fix was found** in the git history through v6.19.13 / origin/master within the search budget. This appears to be an unresolved bug that should be reported to the amdgpu mailing list (amd-gfx@lists.freedesktop.org) with this analysis attached. **Confidence:** High — the register dump (RSI=0, RDX=0) directly confirms the zero-size call; the source code absence of GDS init in `gfx_v12_0.c` is conclusive.