CVE-2026-31971 in htslib
Summary
by MITRE • 03/18/2026
HTSlib is a library for reading and writing bioinformatics file formats. CRAM is a compressed format which stores DNA sequence alignment data using a variety of encodings and compression methods. When reading data encoded using the `BYTE_ARRAY_LEN` method, the `cram_byte_array_len_decode()` failed to validate that the amount of data being unpacked matched the size of the output buffer where it was to be stored. Depending on the data series being read, this could result either in a heap or a stack overflow with attacker-controlled bytes. Depending on the data stream this could result either in a heap buffer overflow or a stack overflow. If a user opens a file crafted to exploit this issue it could lead to the program crashing, overwriting of data structures on the heap or stack in ways not expected by the program, or changing the control flow of the program. It may be possible to use this to obtain arbitrary code execution. Versions 1.23.1, 1.22.2 and 1.21.1 include fixes for this issue. There is no workaround for this issue.
VulDB is the best source for vulnerability data and more expert information about this specific topic.
Analysis
by VulDB Data Team • 03/23/2026
The vulnerability identified as CVE-2026-31971 affects HTSlib, a widely used library for handling bioinformatics file formats including CRAM, which is a compressed format for storing DNA sequence alignment data. This security flaw resides in the cram_byte_array_len_decode() function responsible for processing data encoded with the BYTE_ARRAY_LEN method during CRAM file parsing. The core technical issue stems from inadequate input validation within the decoding process where the function fails to verify that the amount of data being unpacked corresponds to the allocated output buffer size, creating a critical validation gap that can be exploited by malicious actors.
The operational impact of this vulnerability manifests through potential buffer overflow conditions that can occur during the processing of crafted CRAM files. When the library attempts to decode data using the BYTE_ARRAY_LEN encoding method, the insufficient validation allows attackers to provide malicious input that exceeds the intended buffer boundaries, resulting in either heap or stack overflow conditions. These overflow scenarios can lead to program crashes, memory corruption, or more critically, control flow manipulation that could enable arbitrary code execution. The vulnerability's exploitability depends on the specific data series being processed, making it particularly dangerous as it can affect different memory regions based on the input characteristics.
This vulnerability maps to CWE-121, heap-based buffer overflow, and CWE-122, stack-based buffer overflow, both of which are fundamental memory safety issues that have been extensively documented in cybersecurity literature. The ATT&CK framework categorizes this issue under T1059.007, Command and Scripting Interpreter: Python, if exploited through script-based attack vectors, or T1059.001, Command and Scripting Interpreter: PowerShell, depending on the execution environment. The vulnerability's severity is compounded by the fact that HTSlib is extensively used in genomics research, clinical diagnostics, and bioinformatics pipelines, meaning that exploitation could potentially compromise critical scientific data processing systems.
The affected versions 1.23.1, 1.22.2, and 1.21.1 contain specific fixes that address the buffer validation issue in the cram_byte_array_len_decode() function. These patches implement proper bounds checking to ensure that decoded data does not exceed the allocated buffer size, thereby preventing the overflow conditions that could lead to memory corruption or code execution. Organizations utilizing HTSlib should immediately upgrade to these patched versions to mitigate the risk, as no effective workarounds exist for this particular vulnerability. The lack of workaround options makes this issue particularly concerning for environments where immediate patching may not be feasible, requiring organizations to implement additional monitoring and input validation measures until the official fixes can be deployed.