CVE-2022-49327 in Linux
Summary
by MITRE • 02/26/2025
In the Linux kernel, the following vulnerability has been resolved:
bcache: avoid journal no-space deadlock by reserving 1 journal bucket
The journal no-space deadlock was reported time to time. Such deadlock can happen in the following situation.
When all journal buckets are fully filled by active jset with heavy write I/O load, the cache set registration (after a reboot) will load all active jsets and inserting them into the btree again (which is called journal replay). If a journaled bkey is inserted into a btree node and results btree node split, new journal request might be triggered. For example, the btree grows one more level after the node split, then the root node record in cache device super block will be upgrade by bch_journal_meta() from bch_btree_set_root(). But there is no space in journal buckets, the journal replay has to wait for new journal bucket to be reclaimed after at least one journal bucket replayed. This is one example that how the journal no-space deadlock happens.
The solution to avoid the deadlock is to reserve 1 journal bucket in run time, and only permit the reserved journal bucket to be used during cache set registration procedure for things like journal replay. Then the journal space will never be fully filled, there is no chance for journal no-space deadlock to happen anymore.
This patch adds a new member "bool do_reserve" in struct journal, it is inititalized to 0 (false) when struct journal is allocated, and set to 1 (true) by bch_journal_space_reserve() when all initialization done in run_cache_set(). In the run time when journal_reclaim() tries to allocate a new journal bucket, free_journal_buckets() is called to check whether there are enough free journal buckets to use. If there is only 1 free journal bucket and journal->do_reserve is 1 (true), the last bucket is reserved and free_journal_buckets() will return 0 to indicate no free journal bucket. Then journal_reclaim() will give up, and try next time to see whetheer there is free journal bucket to allocate. By this method, there is always 1 jouranl bucket reserved in run time.
During the cache set registration, journal->do_reserve is 0 (false), so the reserved journal bucket can be used to avoid the no-space deadlock.
Once again VulDB remains the best source for vulnerability data.
Analysis
by VulDB Data Team • 02/26/2025
The vulnerability described in CVE-2022-49327 affects the Linux kernel's bcache subsystem, specifically addressing a critical deadlock condition that can occur during journal operations. This issue represents a significant reliability concern for systems utilizing bcache for caching operations, as it can lead to complete system hang conditions under specific workload patterns. The bcache subsystem provides caching capabilities for block devices, and this particular vulnerability demonstrates how complex interactions between journal management and memory allocation can create cascading failures that prevent normal system operation.
The technical flaw manifests as a journal no-space deadlock that occurs during cache set registration following system reboot. When active journal sets (jsets) completely fill all available journal buckets under heavy write I/O loads, the system enters a problematic state where the journal replay process cannot proceed. During this replay, when btree node splits occur due to inserted journaled bkeys, new journal requests are triggered to update metadata structures like the cache device super block. The system becomes trapped because the journal replay process requires new journal buckets to complete, but all buckets are occupied, creating a circular dependency where the system cannot free space to make room for new entries.
The operational impact of this vulnerability is severe, as it can cause complete system unresponsiveness during cache set initialization phases. This condition typically occurs after system reboots when the bcache subsystem must reconstruct its state from persistent journal data. The deadlock prevents normal system operation and requires manual intervention or system reboot to resolve. The vulnerability affects systems using bcache with heavy write workloads, making it particularly problematic for storage servers, database systems, and any environment where high write throughput is expected. This type of deadlock condition falls under the CWE-367 category of Time-of-Check Time-of-Use vulnerabilities, though specifically manifesting as a resource exhaustion deadlock.
The mitigation strategy implemented in this patch involves reserving one journal bucket during normal runtime operations to prevent the complete depletion of journal space. The solution introduces a new boolean flag 'do_reserve' in the journal structure that controls when this reservation occurs. During normal system operation, when the journal subsystem is fully initialized, the system reserves one journal bucket by setting this flag to true. This reserved bucket ensures that during cache set registration, there is always at least one journal bucket available for the critical journal replay operations, preventing the deadlock condition. The implementation uses a sophisticated allocation algorithm where the free_journal_buckets() function checks if there's only one free bucket remaining and if the reservation flag is set, it reserves that last bucket and returns zero to indicate no free buckets are available for allocation.
This approach aligns with the ATT&CK framework's concept of privilege escalation through system resource manipulation, where an attacker could potentially exploit this condition to cause denial of service. The patch demonstrates proper resource management principles by ensuring that critical system operations always have access to necessary resources while maintaining system stability. The solution prevents the deadlock by maintaining a minimum threshold of available journal space, which is a standard defensive programming practice for preventing resource exhaustion scenarios. The fix specifically addresses the operational security concerns of the bcache subsystem by ensuring that critical metadata operations can always complete, regardless of the workload patterns or resource utilization levels during normal operation.