Linux kernel crash report

Source: kernel Bugzilla bug #221376 — “AMD RADEON RX 9070 XT - modprobe amdgpu is fail.”

Key elements

Field	Value	Implication
BUG_URL	bugzilla.kernel.org #221376
UNAME	`6.18.22-x86_64`	Custom kernel compiled by the reporter (build string: `#1 SMP PREEMPT_DYNAMIC Fri Apr 17 01:05:45 +07 2026`)
DISTRO	(none — custom kernel)	No distro debug packages available; vmlinux not obtainable
HARDWARE	GIGABYTE MD72-HB1-00 (dual-socket Intel Xeon Silver 4310 @ 2.10 GHz, 48 threads)
BIOS	F40 12/09/2025
PROCESS	`kworker/12:1`	Kernel worker thread on CPU 12
WORKQUEUE	`events work_for_cpu_fn`	Device probe via `local_pci_probe` run on a specific CPU
TAINT	(Not tainted)	Clean kernel; no proprietary or out-of-tree code
CRASH_TYPE	BUG/BUG_ON	`DRM_MM_BUG_ON(start + size <= start)` in `drm_mm_init()`
CRASH_SITE	`drivers/gpu/drm/drm_mm.c:930`	Assertion inside the DRM memory-range allocator initialiser
SOURCEDIR	`/sdb1/arjan/git/oops-skill/oops-workdir/linux` (tag `v6.18.16`)	Nearest stable tag; 6.18.22 not present — analysis is approximate
VMLINUX	(not available — custom kernel)	`addr2line` mapping not possible; source used directly

Kernel modules

Module	Flags	Backtrace	Flag Implication
amdgpu	+	Y	Being loaded at time of crash (`module_init` path likely involved)
amdxcp
drm_ttm_helper
ttm		Y
drm_exec
drm_panel_backlight_quirks
gpu_sched
drm_suballoc_helper
drm_buddy
drm_display_helper
cec
rc_core
igb
i2c_algo_bit

Backtrace

Address	Function	Offset	Size	Context	Module	Source location
(RIP)	`drm_mm_init`	`0xc1`	`0xd0`	Task	(built-in)	drm_mm.c:930
	`ttm_range_man_init_nocheck`	`0x9d`	`0x180`	Task	`ttm` (build-ID: e3a55dbbe0be)	ttm_range_manager.c:198
	`amdgpu_ttm_init.cold`	`0x45e`	`0x5cc`	Task	`amdgpu` (build-ID: 48030e986eac)	amdgpu_ttm.c:2103
	`amdgpu_bo_init.cold`	`0x5e`	`0x77`	Task	`amdgpu`	`amdgpu_object.c:1088`
	`gmc_v12_0_sw_init`	`0x470`	`0x6f0`	Task	`amdgpu`	`gmc_v12_0.c:847`
	`amdgpu_device_ip_init`	`0x8f`	`0xb43`	Task	`amdgpu`
	`amdgpu_device_init.cold`	`0x1495`	`0x1abe`	Task	`amdgpu`
	`amdgpu_driver_load_kms`	`0x1a`	`0x80`	Task	`amdgpu`
	`amdgpu_pci_probe`	`0x28e`	`0x760`	Task	`amdgpu`
	`local_pci_probe`	`0x51`	`0xc0`	Task	(built-in)
	`work_for_cpu_fn`	`0x1d`	`0x30`	Task	(built-in)
	`process_scheduled_works`	`0x2bc`	`0x680`	Task	(built-in)
	`worker_thread`	`0x1a6`	`0x4a0`	Task	(built-in)
	`kthread`	`0x1a4`	`0x3a0`	Task	(built-in)
	`ret_from_fork`	`0x1f8`	`0x3b0`	Task	(built-in)
	`ret_from_fork_asm`	`0x1a`	`0x30`	Task	(built-in)

CPU registers

Register	Value	Note
RIP	`drm_mm_init+0xc1/0xd0`	Points to `ud2` (BUG) instruction at crash
RSP	`ffa000000d723b60`	Valid kernel stack address
RAX	`0000000000000000`	Zero
RBX	`ff11004093c27400`	Pointer to `struct drm_mm` (`rman->mm`)
RCX	`000000000000001c`
RDX	`0000000000000000`	`size` arg to `drm_mm_init` = 0 — this is the bad value
RSI	`0000000000000000`	`start` arg to `drm_mm_init` = 0
RDI	`ff11004093c27480`	`mm` arg to `drm_mm_init`
RBP	`0000000000000003`
R08	`0000000000000dc0`
R09	`00000000ffffffff`
R10	`ff11004093c27400`
R11	`0000000000000100`
R12	`ff1100408b60f048`
R13	`0000000000000000`
R14	`0000000000000000`
R15	`000000000000000b`
CR2	`00007f15ed7d9f55`	Page-fault address (from a previous unrelated fault)
CR4	`0000000000771ef0`
EFLAGS	`00010246`	ZF=1 (zero flag set — result of the failed comparison)

Key observation: RSI = 0 (start) and RDX = 0 (size) confirm that drm_mm_init was called with start=0, size=0. The assertion start + size <= start → 0 + 0 ≤ 0 → true triggers the BUG.

Code bytes

Code: 83 05 c2 c5 11 04 01 48 c7 83 f0 00 00 00 00 00 00 00 e8 a2 79 cd ff
      48 83 05 b2 c5 11 04 01 5b c3 cc cc cc cc 0f 1f 40 00 90
      <0f> 0b   ← ud2 (BUG()) at drm_mm_init+0xc1
      66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90

The ud2 at <0f> 0b is placed at the end of drm_mm_init (offset 0xc1 in a 0xd0-byte function). The compiler’s normal path returns via ret (c3) earlier; the BUG path jumps here. This is the classic Linux BUG() layout.

Backtrace source code

1. `drm_mm_init` — crash site (`drm_mm.c:930`)

drivers/gpu/drm/drm_mm.c at v6.18.16

920  /**
921   * drm_mm_init - initialize a drm-mm allocator
922   * @mm: the drm_mm structure to initialize
923   * @start: start of the range managed by @mm
924   * @size: end of the range managed by @mm
925   *
926   * Note that @mm must be cleared to 0 before calling this function.
927   */
928  void drm_mm_init(struct drm_mm *mm, u64 start, u64 size)
929  {
930      DRM_MM_BUG_ON(start + size <= start);   // ← CRASH HERE (size == 0, start == 0 → 0 ≤ 0)
931
932      mm->color_adjust = NULL;
     ...
950  }

2. `ttm_range_man_init_nocheck` — call site (`ttm_range_manager.c:198`)

drivers/gpu/drm/ttm/ttm_range_manager.c at v6.18.16

180  int ttm_range_man_init_nocheck(struct ttm_device *bdev,
181                         unsigned type, bool use_tt,
182                         unsigned long p_size)
183  {
184      struct ttm_resource_manager *man;
185      struct ttm_range_manager *rman;
186
187      rman = kzalloc(sizeof(*rman), GFP_KERNEL);
188      if (!rman)
189          return -ENOMEM;
190
191      man = &rman->manager;
192      man->use_tt = use_tt;
193      man->func = &ttm_range_manager_func;
194
195      ttm_resource_manager_init(man, bdev, p_size);
196
197      drm_mm_init(&rman->mm, 0, p_size);   // ← call here; p_size == 0 when GDS/GWS/OA absent
     // ← RSI = 0 (start), RDX = 0 (p_size)
198      spin_lock_init(&rman->lock);
199
200      ttm_set_driver_manager(bdev, type, &rman->manager);
201      ttm_resource_manager_set_used(man, true);
202      return 0;
203  }

3. `amdgpu_ttm_init` — call site (`amdgpu_ttm.c:2103`)

drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c at v6.18.16

2095      /* Initialize various on-chip memory pools */
2103      r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_GDS, adev->gds.gds_size);  // ← call here; gds_size == 0
2104      if (r) {
2105          dev_err(adev->dev, "Failed initializing GDS heap.\n");
2106          return r;
2107      }
2108
2109      r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_GWS, adev->gds.gws_size);  // ← same issue
2110      if (r) {
2111          dev_err(adev->dev, "Failed initializing gws heap.\n");
2112          return r;
2113      }
2114
2115      r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_OA, adev->gds.oa_size);   // ← same issue
2116      if (r) {
2117          dev_err(adev->dev, "Failed initializing oa heap.\n");
2118          return r;
2119      }

amdgpu_ttm_init_on_chip (lines 74–80) is a one-liner:

drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c at v6.18.16

74  static int amdgpu_ttm_init_on_chip(struct amdgpu_device *adev,
75                                      unsigned int type,
76                                      uint64_t size_in_page)
77  {
78      return ttm_range_man_init(&adev->mman.bdev, type,
79                                false, size_in_page);  // ← no guard for size_in_page == 0
80  }

gfx_v11_0 (and all older GFX generations) populate adev->gds.gds_size during gfx_v11_0_set_gds_init() (line 7399–7409 of gfx_v11_0.c):

7399  static void gfx_v11_0_set_gds_init(struct amdgpu_device *adev)
7400  {
7401      unsigned total_cu = ...;
7402      adev->gds.gds_size = 0x1000;    // ← non-zero for GFX11 and earlier
7403      adev->gds.gds_compute_max_wave_id = total_cu * 32 - 1;
7404      adev->gds.gws_size = 64;
7405      adev->gds.oa_size = 16;
7406  }

gfx_v12_0.c has no equivalent function — GDS/GWS/OA were removed in RDNA4 (GFX 12). adev->gds.{gds,gws,oa}_size are never set and remain zero.

What-how-where analysis

What

drm_mm_init() fires a DRM_MM_BUG_ON assertion at drm_mm.c:930:

DRM_MM_BUG_ON(start + size <= start);

The assertion is triggered because both start and size are 0: 0 + 0 = 0 ≤ 0 → condition is true → BUG fires.

This is confirmed by the register dump: RSI = 0 (start) and RDX = 0 (size), which are the second and third arguments to drm_mm_init(mm, start, size) on x86-64 System V ABI.

The assertion is deliberately strict: the DRM memory-range manager requires a non-empty range. Passing size=0 is a programming error on the caller’s side.

How

Q1: Why was drm_mm_init called with size = 0?

A1: ttm_range_man_init_nocheck passes its p_size parameter directly to drm_mm_init as the size argument (line 197–198 of ttm_range_manager.c). p_size was 0 — no check is made for this case.

Q2: Why was ttm_range_man_init_nocheck called with p_size = 0?

A2: amdgpu_ttm_init_on_chip (line 78–79 of amdgpu_ttm.c) calls ttm_range_man_init with size_in_page as-is. It was passed adev->gds.gds_size.

Q3: Why is adev->gds.gds_size == 0?

A3: The AMD GFX 12 / RDNA4 architecture (used by the RX 9070 XT) does not have GDS (Global Data Share), GWS (Global Wave Sync), or OA (On-chip Accumulator) hardware. These resources were present on GFX 11 and earlier. gfx_v11_0.c explicitly initialises them in gfx_v11_0_set_gds_init(). gfx_v12_0.c has no such function — adev->gds.{gds,gws,oa}_size are left at their zero-initialised default values.

Root cause (Negative How): The amdgpu_ttm_init() code that initialises GDS/GWS/OA TTM memory pools at lines 2103–2119 of amdgpu_ttm.c has existed since before GFX 12 was introduced. It correctly assumes non-zero sizes because all previous architectures have these resources. When GFX 12 support was added to gfx_v12_0.c, no equivalent of gfx_v11_0_set_gds_init() was added (correctly, since the hardware no longer has GDS/GWS/OA), but the code in amdgpu_ttm_init() was not updated to guard against zero sizes.

Where

The bug: amdgpu_ttm_init_on_chip does not guard against size_in_page == 0. All callers that pass GDS/GWS/OA sizes will pass 0 for RDNA4 and any future architecture that lacks these resources.

Preferred fix location: amdgpu_ttm_init_on_chip() in amdgpu_ttm.c. Adding an early return for zero size is the most defensive approach and protects all three call sites (GDS, GWS, OA) at once.

Proposed fix (diff form):

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -74,6 +74,9 @@ static int amdgpu_ttm_init_on_chip(struct amdgpu_device *adev,
                        unsigned int type,
                        uint64_t size_in_page)
 {
+   if (!size_in_page)
+       return 0;
+
    return ttm_range_man_init(&adev->mman.bdev, type,
                  false, size_in_page);
 }

An alternative (slightly more explicit) fix is to guard the three call sites individually inside amdgpu_ttm_init():

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2100,18 +2100,24 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 
    /* Initialize various on-chip memory pools */
-   r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_GDS, adev->gds.gds_size);
-   if (r) {
-       dev_err(adev->dev, "Failed initializing GDS heap.\n");
-       return r;
+   if (adev->gds.gds_size) {
+       r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_GDS, adev->gds.gds_size);
+       if (r) {
+           dev_err(adev->dev, "Failed initializing GDS heap.\n");
+           return r;
+       }
    }
 
-   r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_GWS, adev->gds.gws_size);
-   if (r) {
-       dev_err(adev->dev, "Failed initializing gws heap.\n");
-       return r;
+   if (adev->gds.gws_size) {
+       r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_GWS, adev->gds.gws_size);
+       if (r) {
+           dev_err(adev->dev, "Failed initializing gws heap.\n");
+           return r;
+       }
    }
 
-   r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_OA, adev->gds.oa_size);
-   if (r) {
-       dev_err(adev->dev, "Failed initializing oa heap.\n");
-       return r;
+   if (adev->gds.oa_size) {
+       r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_OA, adev->gds.oa_size);
+       if (r) {
+           dev_err(adev->dev, "Failed initializing oa heap.\n");
+           return r;
+       }
    }

The first (single-function) patch is preferred because it is smaller and protects any future callers of amdgpu_ttm_init_on_chip that might also receive a zero size.

Bug introduction

The bug was introduced in two stages:

The GDS/GWS/OA TTM pool initialisation code was added to amdgpu_ttm_init() in commit 473633540c2f (Christian König, 2020-07-23). At the time, all supported architectures had GDS and the code was correct.
RDNA4 / GFX 12 support was added to gfx_v12_0.c without a gfx_v12_0_set_gds_init() equivalent — correctly, since the hardware removed GDS/GWS/OA — but the amdgpu_ttm_init() caller was not updated to handle zero sizes. The first commit touching gmc_v12_0.c that brought RDNA4 support pre-dates v6.18.16 (not individually identified within budget).

The primary bug introduction is the absence of a zero-size guard in amdgpu_ttm_init_on_chip() combined with the legitimate non-initialisation of adev->gds.*_size for GFX 12.

Bug introduction commit not identified with full precision within search budget; the RDNA4 bring-up series predates v6.18.16 and is the effective origin.

Analysis, conclusions and recommendations

Summary: Loading the amdgpu module for an AMD RX 9070 XT (RDNA4 / GFX 12) immediately crashes the kernel with a BUG assertion in drm_mm_init(). The root cause is that amdgpu_ttm_init() unconditionally tries to create TTM memory pools for GDS, GWS, and OA on-chip resources, passing size=0 to the DRM range allocator, which rejects a zero-sized range as invalid.

RDNA4 removed GDS/GWS/OA from the hardware; gfx_v12_0.c correctly leaves those size fields at zero. The missing piece is a zero-size guard in amdgpu_ttm_init_on_chip().

Recommendation for the reporter:

Apply the one-liner fix to amdgpu_ttm.c:

static int amdgpu_ttm_init_on_chip(struct amdgpu_device *adev,
                                    unsigned int type,
                                    uint64_t size_in_page)
{
    if (!size_in_page)        // ← add this guard
        return 0;
    return ttm_range_man_init(&adev->mman.bdev, type,
                              false, size_in_page);
}

No upstream fix was found in the git history through v6.19.13 / origin/master within the search budget. This appears to be an unresolved bug that should be reported to the amdgpu mailing list (amd-gfx@lists.freedesktop.org) with this analysis attached.

Confidence: High — the register dump (RSI=0, RDX=0) directly confirms the zero-size call; the source code absence of GDS init in gfx_v12_0.c is conclusive.