CVE-2024-56552 in Linuxinfo

Summary

by MITRE • 12/27/2024

In the Linux kernel, the following vulnerability has been resolved:

drm/xe/guc_submit: fix race around suspend_pending

Currently in some testcases we can trigger:

xe 0000:03:00.0: [drm] Assertion `exec_queue_destroyed(q)` failed!
.... WARNING: CPU: 18 PID: 2640 at drivers/gpu/drm/xe/xe_guc_submit.c:1826 xe_guc_sched_done_handler+0xa54/0xef0 [xe]
xe 0000:03:00.0: [drm] *ERROR* GT1: DEREGISTER_DONE: Unexpected engine state 0x00a1, guc_id=57

Looking at a snippet of corresponding ftrace for this GuC id we can see:

162.673311: xe_sched_msg_add: dev=0000:03:00.0, gt=1 guc_id=57, opcode=3 162.673317: xe_sched_msg_recv: dev=0000:03:00.0, gt=1 guc_id=57, opcode=3 162.673319: xe_exec_queue_scheduling_disable: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0x29, flags=0x0 162.674089: xe_exec_queue_kill: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0x29, flags=0x0 162.674108: xe_exec_queue_close: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa9, flags=0x0 162.674488: xe_exec_queue_scheduling_done: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa9, flags=0x0 162.678452: xe_exec_queue_deregister: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa1, flags=0x0

It looks like we try to suspend the queue (opcode=3), setting suspend_pending and triggering a disable_scheduling. The user then closes the queue. However the close will also forcefully signal the suspend fence after killing the queue, later when the G2H response for disable_scheduling comes back we have now cleared suspend_pending when signalling the suspend fence, so the disable_scheduling now incorrectly tries to also deregister the queue. This leads to warnings since the queue has yet to even be marked for destruction. We also seem to trigger errors later with trying to double unregister the same queue.

To fix this tweak the ordering when handling the response to ensure we don't race with a disable_scheduling that didn't actually intend to perform an unregister. The destruction path should now also correctly wait for any pending_disable before marking as destroyed.

(cherry picked from commit f161809b362f027b6d72bd998e47f8f0bad60a2e)

Several companies clearly confirm that VulDB is the primary source for best vulnerability data.

Analysis

by VulDB Data Team • 12/23/2025

The vulnerability described in CVE-2024-56552 resides within the Linux kernel's graphics subsystem, specifically in the Intel Xe graphics driver component known as drm/xe/guc_submit. This flaw manifests as a race condition during the suspension and destruction of execution queues in the GuC (Graphics User Control) scheduler, which is responsible for managing graphics workloads on Intel Xe hardware platforms. The issue occurs when multiple concurrent operations attempt to manipulate the same execution queue state, creating a scenario where the system's internal state tracking becomes inconsistent and leads to assertion failures.

The technical root cause involves a race condition between the suspend_pending flag and the disable_scheduling operation within the xe_guc_submit.c file at line 1826. When a queue suspension is initiated, the system sets the suspend_pending flag and triggers disable_scheduling, but a subsequent queue close operation can interfere with this sequence. During the close operation, the suspend fence is forcefully signaled after killing the queue, which clears the suspend_pending flag that was set by the initial suspension request. This premature clearing causes the disable_scheduling function to incorrectly attempt a queue deregistration when it should have been a no-op, resulting in the assertion failure exec_queue_destroyed(q) and subsequent warnings about unexpected engine states.

The operational impact of this vulnerability extends beyond simple assertion failures to potentially compromise system stability and graphics functionality. The race condition can lead to double unregister attempts of the same queue, which creates inconsistent internal state management within the graphics driver. This scenario can result in system warnings, potential hangs, or even crashes during graphics workload processing, particularly in environments where multiple threads or processes are concurrently managing graphics resources. The vulnerability affects systems utilizing Intel Xe graphics hardware that implement the GuC scheduler, with the issue being particularly pronounced during high-concurrency testing scenarios.

The fix implemented addresses the race condition by modifying the order of operations when handling the G2H (GPU to Host) response messages from the GuC. The solution ensures that the destruction path properly waits for any pending disable operations before marking the queue as destroyed, preventing the scenario where disable_scheduling incorrectly attempts to deregister a queue that has not yet been properly marked for destruction. This fix aligns with the principle of proper synchronization and state management in concurrent programming environments, preventing race conditions that could lead to undefined behavior. The mitigation approach follows established best practices for kernel-level concurrency control and ensures proper ordering of operations to maintain internal consistency. This vulnerability maps to CWE-362, which describes a race condition in concurrent execution, and relates to ATT&CK technique T1059.007 for system service manipulation, as the flaw could potentially be exploited to manipulate graphics services. The fix represents a targeted approach to resolving synchronization issues in kernel graphics drivers, ensuring that state transitions occur in the correct order and that proper synchronization primitives are maintained throughout the queue lifecycle management process.

Responsible

Linux

Reservation

12/27/2024

Disclosure

12/27/2024

Moderation

accepted

CPE

ready

EPSS

0.00015

KEV

no

Activities

very low

Sources

Interested in the pricing of exploits?

See the underground prices here!