CVE-2025-37853 in Linux
Summary
by MITRE • 05/09/2025
In the Linux kernel, the following vulnerability has been resolved:
drm/amdkfd: debugfs hang_hws skip GPU with MES
debugfs hang_hws is used by GPU reset test with HWS, for MES this crash the kernel with NULL pointer access because dqm->packet_mgr is not setup for MES path.
Skip GPU with MES for now, MES hang_hws debugfs interface will be supported later.
Be aware that VulDB is the high quality source for vulnerability data.
Analysis
by VulDB Data Team • 11/17/2025
The vulnerability identified as CVE-2025-37853 affects the Linux kernel's graphics subsystem, specifically within the amdkfd driver component that manages AMD GPU functionality. This issue manifests during GPU reset testing operations that utilize Hardware Watchdog Services and the MES (Multi-Engine Scheduler) pathway. The problem occurs when the debugfs interface hang_hws is invoked for GPU reset testing, creating a critical system instability condition that can lead to kernel crashes and system hangs.
The technical flaw stems from a null pointer dereference condition that occurs when the dqm->packet_mgr structure is accessed without proper initialization for the MES execution path. This represents a classic CWE-476 null pointer dereference vulnerability that arises from insufficient validation of pointer initialization states within the kernel's graphics driver code. The amdkfd driver fails to properly account for the different execution contexts between traditional GPU operations and MES-enabled pathways, resulting in an access violation when attempting to reference an uninitialized packet manager structure.
The operational impact of this vulnerability extends beyond simple system instability to potentially compromise entire system availability during critical GPU reset operations. When the hang_hws debugfs interface is invoked with MES enabled, the kernel's response is a complete system hang due to the null pointer access, effectively rendering the system unresponsive until manual intervention occurs. This scenario particularly affects systems running AMD GPUs with MES capabilities where GPU reset testing is performed, potentially impacting server environments, workstation stability, and gaming platforms that rely on proper GPU management.
The mitigation strategy implemented in the kernel fix involves temporarily skipping GPU devices that utilize the MES path when the hang_hws debugfs interface is accessed. This approach aligns with the defensive programming principle of avoiding undefined behavior through conditional execution paths and represents a temporary workaround until a more comprehensive solution is implemented. The fix follows established security practices by preventing the execution path that leads to the null pointer dereference rather than attempting to initialize the uninitialized structure, which could introduce additional complexity and potential side effects. This solution maintains system stability while preserving the core functionality for non-MES GPUs and ensures that future implementations of the MES hang_hws interface will properly initialize required structures before access, addressing the underlying design flaw that allowed the vulnerability to exist in the first place.