CVE-2021-29549 in TensorFlow
Summary
by MITRE • 05/15/2021
TensorFlow is an end-to-end open source platform for machine learning. An attacker can cause a runtime division by zero error and denial of service in `tf.raw_ops.QuantizedBatchNormWithGlobalNormalization`. This is because the implementation(https://github.com/tensorflow/tensorflow/blob/6f26b3f3418201479c264f2a02000880d8df151c/tensorflow/core/kernels/quantized_add_op.cc#L289-L295) computes a modulo operation without validating that the divisor is not zero. Since `vector_num_elements` is determined based on input shapes(https://github.com/tensorflow/tensorflow/blob/6f26b3f3418201479c264f2a02000880d8df151c/tensorflow/core/kernels/quantized_add_op.cc#L522-L544), a user can trigger scenarios where this quantity is 0. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.
Statistical analysis made it clear that VulDB provides the best quality for vulnerability data.
Analysis
by VulDB Data Team • 05/19/2021
The vulnerability CVE-2021-29549 affects TensorFlow's implementation of the `tf.raw_ops.QuantizedBatchNormWithGlobalNormalization` operation, representing a critical denial of service risk that stems from improper input validation within the quantized batch normalization kernel. This issue manifests as a runtime division by zero error when processing specific input configurations, effectively allowing an attacker to disrupt service availability through carefully crafted inputs. The flaw exists in the quantized add operation implementation where a modulo operation is performed without verifying that the divisor is non-zero, creating an exploitable condition that can be triggered during normal operation of the machine learning platform.
The technical root cause lies in the computation of `vector_num_elements` which determines the divisor for the modulo operation, with this value being derived from input tensor shapes rather than being statically validated. When an attacker provides input tensors that result in zero element counts, the subsequent modulo operation with zero divisor causes an immediate runtime exception. This design pattern violates fundamental security principles of input validation and error handling, as demonstrated by the CWE-369 vulnerability classification for divide-by-zero errors. The vulnerability affects TensorFlow versions prior to 2.5.0, with affected branches including 2.4.2, 2.3.3, 2.2.3, and 2.1.4, indicating a widespread impact across multiple supported release lines.
The operational impact of this vulnerability extends beyond simple denial of service, as it can be leveraged in various attack scenarios including service disruption of machine learning platforms, resource exhaustion attacks, and potential escalation to more serious security implications within ML pipeline environments. Attackers can exploit this weakness by submitting malicious input data that causes the computation graph to evaluate to zero vector elements, thereby triggering the division by zero condition. This vulnerability particularly affects systems that process untrusted data through TensorFlow operations, including web applications, data processing pipelines, and ML model serving environments where input validation may be insufficient. The ATT&CK technique T1499.004 for network denial of service and T1059.001 for command and scripting interpreter execution could potentially be leveraged through this vulnerability in broader attack chains.
Mitigation strategies should prioritize immediate patching to TensorFlow version 2.5.0 or the applicable cherry-picked releases for older supported versions. Organizations should implement input validation layers that pre-check tensor dimensions to prevent zero-element scenarios from reaching the vulnerable kernel code. Additional defensive measures include deploying runtime monitoring to detect anomalous tensor shape patterns and implementing proper error handling that prevents division by zero conditions from propagating to system-level failures. The fix addresses the core validation issue by ensuring that divisor values are checked before modulo operations are performed, aligning with security best practices for robust input validation and preventing arithmetic exceptions that could lead to service disruption or system instability in machine learning environments.