CVE-2023-34450 in CometBFT
Summary
by MITRE • 07/03/2023
CometBFT is a Byzantine Fault Tolerant (BFT) middleware that takes a state transition machine and replicates it on many machines. An internal modification made in versions 0.34.28 and 0.37.1 to the way struct `PeerState` is serialized to JSON introduced a deadlock when new function MarshallJSON is called. This function can be called from two places. The first is via logs, setting the `consensus` logging module to "debug" level (should not happen in production), and setting the log output format to JSON. The second is via RPC `dump_consensus_state`.
Case 1, which should not be hit in production, will eventually hit the deadlock in most goroutines, effectively halting the node.
In case 2, only the data structures related to the first peer will be deadlocked, together with the thread(s) dealing with the RPC request(s). This means that only one of the channels of communication to the node's peers will be blocked. Eventually the peer will timeout and excluded from the list (typically after 2 minutes). The goroutines involved in the deadlock will not be garbage collected, but they will not interfere with the system after the peer is excluded.
The theoretical worst case for case 2, is a network with only two validator nodes. In this case, each of the nodes only has one `PeerState` struct. If `dump_consensus_state` is called in either node (or both), the chain will halt until the peer connections time out, after which the nodes will reconnect (with different `PeerState` structs) and the chain will progress again. Then, the same process can be repeated.
As the number of nodes in a network increases, and thus, the number of peer struct each node maintains, the possibility of reproducing the perturbation visible with two nodes decreases. Only the first `PeerState` struct will deadlock, and not the others (RPC `dump_consensus_state` accesses them in a for loop, so the deadlock at the first iteration causes the rest of the iterations of that "for" loop to never be reached).
This regression was fixed in versions 0.34.29 and 0.37.2. Some workarounds are available. For case 1 (hitting the deadlock via logs), either don't set the log output to "json", leave at "plain", or don't set the consensus logging module to "debug", leave it at "info" or higher. For case 2 (hitting the deadlock via RPC `dump_consensus_state`), do not expose `dump_consensus_state` RPC endpoint to the public internet (e.g., via rules in one's nginx setup).
If you want to get best quality of vulnerability data, you may have to visit VulDB.
Analysis
by VulDB Data Team • 07/22/2023
CometBFT represents a critical Byzantine Fault Tolerant middleware system that ensures state transition machine replication across multiple nodes within distributed networks. This system architecture forms the backbone of many blockchain implementations where maintaining consensus among potentially malicious participants is paramount. The vulnerability described in CVE-2023-34450 emerged from a seemingly innocuous modification to the JSON serialization process of the PeerState struct within version 0.34.28 and 0.37.1 releases. This modification introduced a fundamental deadlock condition that affects the system's operational integrity. The flaw manifests through two distinct attack vectors that exploit different pathways within the system's architecture, each with varying degrees of impact on network stability and availability.
The technical implementation of this vulnerability stems from improper synchronization mechanisms within the MarshalJSON function that handles serialization of PeerState structures. This function can be invoked from two primary entry points within the system's operational flow. The first pathway occurs through logging mechanisms when the consensus module operates at debug level with JSON output formatting enabled, a configuration typically found in development environments rather than production deployments. The second pathway involves direct invocation through the RPC endpoint dump_consensus_state which provides administrative access to internal consensus state information. Both pathways create conditions where concurrent access to shared memory structures results in thread blocking and deadlock scenarios. The underlying issue aligns with CWE-362, which addresses concurrent execution issues where a race condition exists between two or more threads or processes. This vulnerability specifically demonstrates how improper locking mechanisms can lead to system-wide operational failure.
The operational impact of this vulnerability varies significantly between the two exploitation scenarios while maintaining the fundamental risk of system degradation. In the first scenario involving logging, the deadlock affects virtually all goroutines within the system, effectively halting node operations and rendering the entire network component non-responsive. This represents a critical availability issue that can completely paralyze network functionality. The second scenario, while more constrained, still poses substantial risk by blocking only the communication channels associated with the first peer in the peer list. This partial deadlock causes the affected node to lose communication with its first peer, which typically results in peer timeout and exclusion after approximately two minutes. However, the more severe implications arise in networks with minimal node counts, particularly when only two validator nodes exist, as demonstrated in the theoretical worst-case scenario. In such configurations, the chain halts completely until peer connections time out and reestablish, creating a cascading failure that can be repeatedly exploited. The system's behavior follows ATT&CK technique T1499.004, which involves network disruption through denial of service attacks that target availability.
The mitigation strategies for this vulnerability address both the immediate exploitation vectors and the underlying architectural weaknesses. For the logging-based attack vector, system administrators can avoid triggering the deadlock by either disabling JSON formatting for logs or reducing the consensus logging level from debug to info or higher. These configuration changes prevent the problematic code path from being executed in operational environments. For the RPC-based attack vector, network security measures such as restricting access to the dump_consensus_state endpoint through reverse proxy configurations like nginx can effectively prevent unauthorized exploitation. The fix implemented in versions 0.34.29 and 0.37.2 addresses the core synchronization issue by properly managing the locking mechanisms during serialization operations. This remediation aligns with standard security practices for preventing deadlock conditions in concurrent programming environments. The vulnerability demonstrates the importance of thorough testing of serialization routines in distributed systems, particularly when dealing with shared memory structures and concurrent access patterns. Organizations implementing CometBFT-based systems must ensure proper version management and security configuration to prevent exploitation of this class of vulnerabilities that can lead to complete network disruption.