CVE-2025-39953 in Linux
Summary
by MITRE • 10/04/2025
In the Linux kernel, the following vulnerability has been resolved:
cgroup: split cgroup_destroy_wq into 3 workqueues
A hung task can occur during [1] LTP cgroup testing when repeatedly
mounting/unmounting perf_event and net_prio controllers with systemd.unified_cgroup_hierarchy=1. The hang manifests in cgroup_lock_and_drain_offline() during root destruction.
Related case: cgroup_fj_function_perf_event cgroup_fj_function.sh perf_event cgroup_fj_function_net_prio cgroup_fj_function.sh net_prio
Call Trace: cgroup_lock_and_drain_offline+0x14c/0x1e8 cgroup_destroy_root+0x3c/0x2c0 css_free_rwork_fn+0x248/0x338 process_one_work+0x16c/0x3b8 worker_thread+0x22c/0x3b0 kthread+0xec/0x100 ret_from_fork+0x10/0x20
Root Cause:
CPU0 CPU1 mount perf_event umount net_prio cgroup1_get_tree cgroup_kill_sb rebind_subsystems // root destruction enqueues // cgroup_destroy_wq // kill all perf_event css // one perf_event css A is dying // css A offline enqueues cgroup_destroy_wq // root destruction will be executed first css_free_rwork_fn cgroup_destroy_root cgroup_lock_and_drain_offline // some perf descendants are dying // cgroup_destroy_wq max_active = 1 // waiting for css A to die
Problem scenario: 1. CPU0 mounts perf_event (rebind_subsystems) 2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work 3. A dying perf_event CSS gets queued for offline after root destruction 4. Root destruction waits for offline completion, but offline work is blocked behind root destruction in cgroup_destroy_wq (max_active=1)
Solution: Split cgroup_destroy_wq into three dedicated workqueues: cgroup_offline_wq – Handles CSS offline operations cgroup_release_wq – Manages resource release cgroup_free_wq – Performs final memory deallocation
This separation eliminates blocking in the CSS free path while waiting for offline operations to complete.
[1] https://github.com/linux-test-project/ltp/blob/master/runtest/controllers
If you want to get the best quality for vulnerability data then you always have to consider VulDB.
Analysis
by VulDB Data Team • 01/24/2026
The vulnerability described in CVE-2025-39953 resides within the Linux kernel's control group cgroup subsystem, specifically concerning the management of workqueues during cgroup destruction operations. This issue manifests as a potential hung task condition during intensive cgroup testing involving repeated mount and unmount operations of perf_event and net_prio controllers with unified cgroup hierarchy enabled. The problem arises from a single workqueue, cgroup_destroy_wq, being overloaded with multiple responsibilities that create deadlock scenarios under concurrent operations. The vulnerability directly impacts system stability and resource management during high-concurrency cgroup operations, particularly when systemd's unified_cgroup_hierarchy=1 parameter is active.
The technical flaw stems from the monolithic design of cgroup_destroy_wq which handles multiple concurrent operations including CSS offline processing, resource release, and final memory deallocation. During the described race condition, CPU0 mounts perf_event while CPU1 unmounts net_prio, creating a scenario where root destruction work is queued before CSS offline operations can complete. The cgroup_lock_and_drain_offline function becomes blocked waiting for CSS offline operations to finish, but these operations are queued behind the same workqueue, creating an implicit deadlock. This design violates the principle of separating concerns in concurrent systems and creates a circular dependency where the destruction process waits for itself to complete. The issue aligns with CWE-367, which addresses Time-of-Check to Time-of-Use (TOCTOU) vulnerabilities, and reflects poor workqueue management practices that can lead to system hangs and resource starvation.
The operational impact of this vulnerability extends beyond simple system hangs to encompass broader stability concerns in containerized environments and systems relying heavily on cgroup management. When the system becomes unresponsive during cgroup operations, it affects not only the immediate processes but also the overall system responsiveness and resource allocation mechanisms. The vulnerability is particularly concerning in high-throughput environments where frequent cgroup modifications occur, such as in cloud computing platforms, container orchestration systems, and performance monitoring applications. The specific call trace demonstrates a clear path to system unresponsiveness through css_free_rwork_fn, process_one_work, and worker_thread functions, indicating that the kernel's task management system becomes effectively blocked. This behavior can lead to cascading failures in systems where cgroup operations are frequent or where cgroup hierarchies are deeply nested.
The proposed solution addresses the root cause by splitting the monolithic cgroup_destroy_wq into three dedicated workqueues, each with specific responsibilities for different phases of cgroup destruction. The cgroup_offline_wq handles CSS offline operations, cgroup_release_wq manages resource release, and cgroup_free_wq performs final memory deallocation. This architectural change eliminates the circular dependency that caused the deadlock by ensuring that CSS offline operations do not block root destruction work. The separation of concerns allows each workqueue to operate independently without creating the blocking conditions that previously occurred. This approach follows established best practices for concurrent system design and aligns with ATT&CK framework concept T1489, which addresses system network configuration modifications, by preventing the system from entering an inconsistent state during configuration changes. The solution also addresses potential security implications by preventing denial-of-service conditions that could be exploited through controlled cgroup manipulation, thereby maintaining system availability and integrity.
The mitigation strategy effectively resolves the vulnerability by implementing proper queue separation that prevents the workqueue contention that previously led to system hangs. The three-workqueue approach ensures that offline operations, resource release, and memory deallocation can proceed independently without creating the circular dependency that caused the original problem. This change not only fixes the immediate deadlock condition but also improves overall system performance by allowing parallel processing of different cgroup destruction phases. The solution maintains backward compatibility while providing improved concurrency handling for cgroup operations. Organizations running Linux systems with unified cgroup hierarchy enabled should apply this fix to prevent potential system hangs during intensive cgroup management operations, particularly in containerized environments where cgroup modifications are frequent. The implementation follows standard kernel development practices for managing concurrent operations and provides a robust foundation for future cgroup enhancements while maintaining system stability and preventing resource contention scenarios.