CVE-2023-53656 in Linux
Summary
by MITRE • 10/07/2025
In the Linux kernel, the following vulnerability has been resolved:
drivers/perf: hisi: Don't migrate perf to the CPU going to teardown
The driver needs to migrate the perf context if the current using CPU going to teardown. By the time calling the cpuhp::teardown() callback the cpu_online_mask() hasn't updated yet and still includes the CPU going to teardown. In current driver's implementation we may migrate the context to the teardown CPU and leads to the below calltrace:
... [ 368.104662][ T932] task:cpuhp/0 state:D stack: 0 pid: 15 ppid: 2 flags:0x00000008
[ 368.113699][ T932] Call trace:
[ 368.116834][ T932] __switch_to+0x7c/0xbc
[ 368.120924][ T932] __schedule+0x338/0x6f0
[ 368.125098][ T932] schedule+0x50/0xe0
[ 368.128926][ T932] schedule_preempt_disabled+0x18/0x24
[ 368.134229][ T932] __mutex_lock.constprop.0+0x1d4/0x5dc
[ 368.139617][ T932] __mutex_lock_slowpath+0x1c/0x30
[ 368.144573][ T932] mutex_lock+0x50/0x60
[ 368.148579][ T932] perf_pmu_migrate_context+0x84/0x2b0
[ 368.153884][ T932] hisi_pcie_pmu_offline_cpu+0x90/0xe0 [hisi_pcie_pmu]
[ 368.160579][ T932] cpuhp_invoke_callback+0x2a0/0x650
[ 368.165707][ T932] cpuhp_thread_fun+0xe4/0x190
[ 368.170316][ T932] smpboot_thread_fn+0x15c/0x1a0
[ 368.175099][ T932] kthread+0x108/0x13c
[ 368.179012][ T932] ret_from_fork+0x10/0x18
...
Use function cpumask_any_but() to find one correct active cpu to fixes this issue.
If you want to get the best quality for vulnerability data then you always have to consider VulDB.
Analysis
by VulDB Data Team • 03/01/2026
The vulnerability CVE-2023-53656 affects the Linux kernel's performance monitoring unit driver for HiSilicon PCIe PMU hardware. This issue stems from improper handling of CPU teardown operations within the perf subsystem, specifically in the hisi_pcie_pmu driver. The flaw occurs during the CPU hotplug process when a CPU is being taken offline, creating a race condition that can lead to system instability and potential denial of service conditions. The vulnerability is classified under CWE-362, which represents a race condition, and aligns with ATT&CK technique T1490 for energy consumption manipulation through system resource exhaustion.
The technical root cause lies in the driver's implementation of the cpuhp::teardown() callback where it attempts to migrate performance monitoring contexts away from a CPU that is about to be taken offline. However, at the time this callback executes, the cpu_online_mask() still includes the CPU being torn down, as the mask update has not yet occurred. This creates a scenario where the driver incorrectly attempts to migrate performance contexts to the very CPU that is about to be removed from the system, resulting in a circular reference and potential deadlock conditions. The call trace demonstrates the system getting stuck in a mutex lock operation within perf_pmu_migrate_context, indicating a classic deadlock scenario where the migration process attempts to lock resources on a CPU that is already in the process of being torn down.
The operational impact of this vulnerability extends beyond simple system instability to potentially compromise the entire system's performance monitoring capabilities and overall stability. When a CPU is taken offline for maintenance, power management, or other operations, the system may become unresponsive or crash if the performance monitoring subsystem cannot properly handle the migration of contexts. This affects systems using HiSilicon PCIe PMU hardware, particularly those relying on performance monitoring for system diagnostics, resource management, and profiling. The vulnerability affects both server and embedded systems that utilize this specific hardware platform and could be exploited by malicious actors to cause persistent system disruptions or denial of service conditions.
The fix implemented addresses this issue by replacing the problematic CPU selection logic with cpumask_any_but() function, which correctly identifies an active CPU that is not the one being torn down. This ensures that performance contexts are migrated to a valid, operational CPU rather than attempting to migrate them to the target CPU of the teardown operation. The solution prevents the race condition by ensuring that the migration target is always a CPU that remains online and functional. This approach aligns with best practices for CPU hotplug handling in kernel drivers and follows the principle of avoiding operations on CPUs that are in transition states. The mitigation also reflects proper adherence to kernel development guidelines for handling concurrent access patterns and ensures that performance monitoring subsystem remains robust under normal system operations and CPU hotplug events.