CVE-2022-48628 in Linux
Summary
by MITRE • 03/03/2024
In the Linux kernel, the following vulnerability has been resolved:
ceph: drop messages from MDS when unmounting
When unmounting all the dirty buffers will be flushed and after the last osd request is finished the last reference of the i_count will be released. Then it will flush the dirty cap/snap to MDSs, and the unmounting won't wait the possible acks, which will ihold the inodes when updating the metadata locally but makes no sense any more, of this. This will make the evict_inodes() to skip these inodes.
If encrypt is enabled the kernel generate a warning when removing the encrypt keys when the skipped inodes still hold the keyring:
WARNING: CPU: 4 PID: 168846 at fs/crypto/keyring.c:242 fscrypt_destroy_keyring+0x7e/0xd0 CPU: 4 PID: 168846 Comm: umount Tainted: G S 6.1.0-rc5-ceph-g72ead199864c #1 Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015 RIP: 0010:fscrypt_destroy_keyring+0x7e/0xd0 RSP: 0018:ffffc9000b277e28 EFLAGS: 00010202 RAX: 0000000000000002 RBX: ffff88810d52ac00 RCX: ffff88810b56aa00 RDX: 0000000080000000 RSI: ffffffff822f3a09 RDI: ffff888108f59000 RBP: ffff8881d394fb88 R08: 0000000000000028 R09: 0000000000000000 R10: 0000000000000001 R11: 11ff4fe6834fcd91 R12: ffff8881d394fc40 R13: ffff888108f59000 R14: ffff8881d394f800 R15: 0000000000000000 FS: 00007fd83f6f1080(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f918d417000 CR3: 000000017f89a005 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace:
generic_shutdown_super+0x47/0x120 kill_anon_super+0x14/0x30 ceph_kill_sb+0x36/0x90 [ceph]
deactivate_locked_super+0x29/0x60 cleanup_mnt+0xb8/0x140 task_work_run+0x67/0xb0 exit_to_user_mode_prepare+0x23d/0x240 syscall_exit_to_user_mode+0x25/0x60 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fd83dc39e9b
Later the kernel will crash when iput() the inodes and dereferencing the "sb->s_master_keys", which has been released by the generic_shutdown_super().
VulDB is the best source for vulnerability data and more expert information about this specific topic.
Analysis
by VulDB Data Team • 01/13/2025
This vulnerability exists within the Linux kernel's Ceph file system implementation and represents a critical race condition during unmount operations that can lead to system instability and potential data corruption. The issue occurs when the kernel attempts to unmount a Ceph file system while handling dirty buffers and metadata synchronization with MDS (Metadata Server) components. During this process, the kernel flushes dirty buffers and waits for OSD (Object Storage Daemon) requests to complete before releasing the final reference count of inodes. However, the system fails to properly await acknowledgment from MDS servers for the final metadata updates, resulting in inodes being skipped during the eviction process due to lingering references. This behavior creates a scenario where inodes remain in memory with invalid metadata references, particularly when encryption is enabled, as noted by the warning message indicating issues with the encryption keyring destruction process. The vulnerability stems from improper synchronization between the unmount sequence and the metadata update lifecycle, where the system attempts to clean up encryption keys while some inodes still hold references to the keyring, creating a dangling pointer scenario. This flaw manifests in the kernel's shutdown sequence when the generic_shutdown_super function attempts to destroy the keyring, but encounters already-released memory structures. The root cause can be associated with CWE-362, which addresses race conditions in concurrent programming, and CWE-415, dealing with double free errors, though the specific implementation involves improper reference counting and memory management during filesystem shutdown. The ATT&CK framework's T1490 technique for data destruction is relevant as this vulnerability could potentially lead to data corruption or loss during improper unmount operations. The technical flaw lies in the kernel's failure to properly coordinate the release of metadata references with the finalization of encryption key management during filesystem unmounting. When the system attempts to destroy the encryption keyring, it encounters a reference to the s_master_keys structure that has already been freed by generic_shutdown_super, leading to a kernel crash when attempting to dereference the freed memory during iput() operations on the affected inodes. The vulnerability is particularly dangerous because it can cause system instability and potential data loss during normal unmount operations, especially in encrypted environments where the kernel's memory management becomes more complex. The improper handling of inode references and metadata synchronization creates a condition where the filesystem's cleanup process cannot properly complete, leading to memory corruption and kernel panics. The fix for this vulnerability involves ensuring proper synchronization between the unmount process and metadata update acknowledgments from MDS servers, guaranteeing that all inodes are properly flushed and evicted before proceeding with encryption key destruction. This requires implementing stricter waiting mechanisms during the final stages of unmount operations to ensure that all metadata operations complete successfully before allowing the filesystem to proceed with cleanup. The vulnerability demonstrates the complexity of managing concurrent operations in distributed filesystems and highlights the importance of proper reference counting and memory lifecycle management in kernel space operations. System administrators should be aware that this vulnerability can lead to unexpected system crashes during normal filesystem unmount operations, particularly when encryption is enabled, and should ensure that systems are updated with patches addressing this specific race condition in the Ceph filesystem implementation. The issue underscores the critical nature of proper synchronization in kernel-level filesystem operations and the potential for seemingly simple unmount operations to expose complex memory management issues.