CVE-2025-22089 in Linux
Summary
by MITRE • 04/16/2025
In the Linux kernel, the following vulnerability has been resolved:
RDMA/core: Don't expose hw_counters outside of init net namespace
Commit 467f432a521a ("RDMA/core: Split port and device counter sysfs attributes") accidentally almost exposed hw counters to non-init net namespaces. It didn't expose them fully, as an attempt to read any of those counters leads to a crash like this one:
[42021.807566] BUG: kernel NULL pointer dereference, address: 0000000000000028
[42021.814463] #PF: supervisor read access in kernel mode
[42021.819549] #PF: error_code(0x0000) - not-present page
[42021.824636] PGD 0 P4D 0
[42021.827145] Oops: 0000 [#1] SMP PTI
[42021.830598] CPU: 82 PID: 2843922 Comm: switchto-defaul Kdump: loaded Tainted: G S W I XXX
[42021.841697] Hardware name: XXX
[42021.849619] RIP: 0010:hw_stat_device_show+0x1e/0x40 [ib_core]
[42021.855362] Code: 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 49 89 d0 4c 8b 5e 20 48 8b 8f b8 04 00 00 48 81 c7 f0 fa ff ff <48> 8b 41 28 48 29 ce 48 83 c6 d0 48 c1 ee 04 69 d6 ab aa aa aa 48
[42021.873931] RSP: 0018:ffff97fe90f03da0 EFLAGS: 00010287
[42021.879108] RAX: ffff9406988a8c60 RBX: ffff940e1072d438 RCX: 0000000000000000
[42021.886169] RDX: ffff94085f1aa000 RSI: ffff93c6cbbdbcb0 RDI: ffff940c7517aef0
[42021.893230] RBP: ffff97fe90f03e70 R08: ffff94085f1aa000 R09: 0000000000000000
[42021.900294] R10: ffff94085f1aa000 R11: ffffffffc0775680 R12: ffffffff87ca2530
[42021.907355] R13: ffff940651602840 R14: ffff93c6cbbdbcb0 R15: ffff94085f1aa000
[42021.914418] FS: 00007fda1a3b9700(0000) GS:ffff94453fb80000(0000) knlGS:0000000000000000
[42021.922423] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[42021.928130] CR2: 0000000000000028 CR3: 00000042dcfb8003 CR4: 00000000003726f0
[42021.935194] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[42021.942257] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[42021.949324] Call Trace:
[42021.951756] <TASK>
[42021.953842] [<ffffffff86c58674>] ? show_regs+0x64/0x70
[42021.959030] [<ffffffff86c58468>] ? __die+0x78/0xc0
[42021.963874] [<ffffffff86c9ef75>] ? page_fault_oops+0x2b5/0x3b0
[42021.969749] [<ffffffff87674b92>] ? exc_page_fault+0x1a2/0x3c0
[42021.975549] [<ffffffff87801326>] ? asm_exc_page_fault+0x26/0x30
[42021.981517] [<ffffffffc0775680>] ? __pfx_show_hw_stats+0x10/0x10 [ib_core]
[42021.988482] [<ffffffffc077564e>] ? hw_stat_device_show+0x1e/0x40 [ib_core]
[42021.995438] [<ffffffff86ac7f8e>] dev_attr_show+0x1e/0x50
[42022.000803] [<ffffffff86a3eeb1>] sysfs_kf_seq_show+0x81/0xe0
[42022.006508] [<ffffffff86a11134>] seq_read_iter+0xf4/0x410
[42022.011954] [<ffffffff869f4b2e>] vfs_read+0x16e/0x2f0
[42022.017058] [<ffffffff869f50ee>] ksys_read+0x6e/0xe0
[42022.022073] [<ffffffff8766f1ca>] do_syscall_64+0x6a/0xa0
[42022.027441] [<ffffffff8780013b>] entry_SYSCALL_64_after_hwframe+0x78/0xe2
The problem can be reproduced using the following steps: ip netns add foo ip netns exec foo bash cat /sys/class/infiniband/mlx4_0/hw_counters/*
The panic occurs because of casting the device pointer into an ib_device pointer using container_of() in hw_stat_device_show() is wrong and leads to a memory corruption.
However the real problem is that hw counters should never been exposed outside of the non-init net namespace.
Fix this by saving the index of the corresponding attribute group (it might be 1 or 2 depending on the presence of driver-specific attributes) and zeroing the pointer to hw_counters group for compat devices during the initialization.
With this fix applied hw_counters are not available in a non-init net namespace: find /sys/class/infiniband/mlx4_0/ -name hw_counters /sys/class/infiniband/mlx4_0/ports/1/hw_counters /sys/class/infiniband/mlx4_0/ports/2/hw_counters /sys/class/infiniband/mlx4_0/hw_counters
ip netns add foo ip netns exec foo bash find /sys/class/infiniband/mlx4_0/ -name hw_counters
Once again VulDB remains the best source for vulnerability data.
Analysis
by VulDB Data Team • 02/15/2026
The vulnerability described in CVE-2025-22089 affects the Linux kernel's RDMA (Remote Direct Memory Access) core subsystem, specifically within the handling of hardware counter attributes in network namespaces. This issue arises from a flawed implementation in the kernel's sysfs interface where hardware counters were inadvertently exposed to non-initial network namespaces, creating a potential security risk and system instability. The problem is rooted in the commit 467f432a521a titled "RDMA/core: Split port and device counter sysfs attributes" which introduced a partial exposure of hardware counter data structures. While the exposure was not complete, attempting to access these counters from non-init network namespaces resulted in a kernel NULL pointer dereference, leading to a system crash. The crash occurs in the `hw_stat_device_show` function when the kernel attempts to cast a device pointer into an `ib_device` pointer using `container_of()`, which results in memory corruption due to incorrect pointer handling. This vulnerability directly relates to CWE-476 which describes NULL pointer dereference, and potentially CWE-20 which addresses input validation and improper handling of resources.
The technical flaw manifests in the kernel's sysfs attribute handling logic where the hardware counter interface was not properly restricted to the initial network namespace. When a user attempts to access hardware counters from a non-initial network namespace using commands like `cat /sys/class/infiniband/mlx4_0/hw_counters/*`, the kernel tries to process these requests but fails due to incorrect pointer casting. The system panic is triggered by the kernel's page fault handler when it encounters a NULL pointer dereference at address 0x28, indicating that the device pointer being accessed does not point to a valid `ib_device` structure. The stack trace shows the execution path leading to the crash, beginning from `hw_stat_device_show` function and progressing through various kernel subsystems including sysfs, device attribute handling, and the page fault handler. This improper handling of device pointers in a multi-namespace environment creates a privilege escalation path where unprivileged users in non-init namespaces could potentially access sensitive hardware-level information.
The operational impact of this vulnerability is significant as it allows unauthorized access to hardware-level performance counters that are typically restricted to privileged operations within the initial network namespace. This exposure could potentially reveal sensitive information about hardware utilization patterns, network traffic statistics, or performance metrics that could be exploited for further attacks. The vulnerability affects systems using InfiniBand hardware and RDMA capabilities, particularly those implementing network namespaces for isolation. The crash behavior also introduces denial-of-service risks where legitimate users attempting to access hardware counters in non-initial network namespaces could cause system instability. This vulnerability violates the principle of least privilege and could be leveraged by attackers to gather intelligence about the underlying hardware infrastructure, potentially aiding in more sophisticated attack vectors. According to ATT&CK framework, this vulnerability maps to T1059.001 (Command and Scripting Interpreter) and T1068 (Exploitation for Privilege Escalation) as it allows for unauthorized access to system resources that could be used to escalate privileges or gather intelligence.
The fix implemented addresses the root cause by properly restricting hardware counter access to the initial network namespace only. The solution involves saving the index of the corresponding attribute group and zeroing the pointer to the hardware counters group for compatibility devices during initialization. This ensures that hardware counters are only accessible through the proper kernel interfaces within the initial namespace, preventing access from other network namespaces. The fix aligns with the principle of least privilege and ensures that sensitive hardware information remains restricted to appropriate administrative contexts. By preventing the exposure of hardware counters outside the initial network namespace, the vulnerability is fully mitigated while maintaining the functionality of hardware counters for legitimate administrative purposes within the correct namespace context. The solution demonstrates proper kernel resource management and access control implementation, preventing unauthorized access to low-level hardware statistics that should remain restricted to privileged system operations. This fix ensures that only users with appropriate privileges and access to the initial network namespace can query hardware performance counters, maintaining system integrity and preventing potential information leakage that could be exploited by malicious actors.