IAM

ARTICLE

Benchmarking Bit Errors in Quantized Neural Networks with PyTorch

Similar to my article series on adversarial robustness, I was planning to have a series on bit errors robustness accompanied by PyTorch code. Instead, due to time constraints, I decided to condense the information into a single article. The code for the originally planned six articles is available on GitHub.

Introduction

I was planning to have an article series on experimenting with bit errors in quantized deep networks — similar to my article series on adversarial robustness — with accompanying PyTorch code. However, in light of the incredible recent progress in machine learning, I decided to focus on other projects. Nevertheless, I wanted to share the tutorial code I prepared with some pointers for those interested in quantization and bit error robustness. So, in this article, I want to share links to the code, some results and pointers to the relevant literature and background.

Figure 1: Example of a deep learning accelerator called Dante from [3].

With the incredible interest in deploying deep networks, doing so efficiently on special-purpose hardware becomes more and more important. Often, this involves so-called deep learning accelerators that are explicitly designed for specific architectures and allow space and energy-efficient inference. These chips are often extremely small, as shown in Figure 1, and are comprised of arrays of processing units alongside memory for activations and memory. However, making such chips effective, energy efficient and secure puts additional requirements on the deployed neural networks: networks need to be quantized to few bits per weights and this quantization needs to be reliable to bit errors. The latter are caused by low voltage operation [3] to improve energy efficiency, faulty hardware or can be caused by bit-level attacks on chips.

Our work on bit error robustness [1,2] tackles these requirements on several levels: quanization, regularization and training. The below repository contains PyTorch code demonstrating our work in six steps:

  • Fixed-point neural network quantization
  • Quantization-aware training yielding 4.5% 4-bit test error on CIFAR10
  • Implementing bit operations in Pytorch with CUDA
  • Benchmarking bit errors in quantized neural networks
  • Training with weight clipping to improve robustness
  • Training with random bit errors
PyTorch code on GitHub
  • [1] D. Stutz, N. Chandramoorthy, M. Hein, B. Schiele. Bit Error Robustness for Energy-Efficient DNN Accelerators. MLSys, 2021.
  • [2] D. Stutz, N. Chandramoorthy, M. Hein, B. Schiele. Random and Adversarial Bit Error Robustness: Energy-Efficient and Secure DNN Accelerators. TPAMI, 2022.
  • [3] N. Chandramoorthy, K. Swaminathan, M. Cochet, A. Paidimarri, S. Eldridge, R. V. Joshi, M. M. Ziegler, A. Buyuktosunoglu, and P. Bose, “Resilient low voltage accelerators for high energy efficiency,” in HPCA, 2019.

More details and pointers

In general, network quantization determines how weights are represented on a chip's memory. Most of research in deep learning is generally done in 16- or 32-bit floating point, but this is not suitable and efficient on special-purpose hardware. A simple quantization scheme, fixed-point quantization, quantizes each weight into $m$ bits, allowing for $2^m$ distinct values. So a weight $w_i \in [-q_{\text{max}}, q_{\text{max}}]$ is represented by a signed or unsigned $m$-bit integer, allowing for integer-based arithmetic. Even though this is a rather simple quantization scheme, it involves quite a few details: whether to use signed/unsigned integers, symmetric or asymmetric quantization, whether quantization range is determined per layer or globally across the network, etc. Details can be found in Section 4.1 of [2] and implementation are included in 101-network-quantization/common/quantization.py with an example in 101-network-quantization/examples.

As naive quantization of pre-trained networks, especially with few bits, leads to significant reduction in accuracy. Thus, common practice is to quantize during training in order to obtain networks robust to the introduced quantization errors. 102-quantization-aware-training/examples/train.py allows to train networks with nearly no accuracy degradation with 8 bit. The below table summarizes some resutls for wide ResNets and SimpleNets with batch normalization (BN) and group normalization (GN):

Model Test Error in %
WRN-28-10 4-bit BN5.83
WRN-28-10 4-bit GN6.33
WRN-28-10 8-bit BN2.58
WRN-28-10 8-bit GN3.17
SimpleNet 4-bit BN6.35
SimpleNet 4-bit GN5.7
SimpleNet 8-bit BN3.65
SimpleNet 8-bit GN4.78
More details can be found in Section 4.2 of [2].
Given a quantized network, we want to evaluate its robustness against bit errors in the quantized weights. However, it is not trivial to manipulate quantized tensors on the bit-level. Luckily, PyTorch is easily extended using C/CUDA code and bit operations are relatively straight-forward to implement in C/CUDA and paralellize over arrays since most operations (bitwise and, or, xor, or sampling random bits) are element-wise by construction. 103-bit-operations/common/torch/bitwise.py uses cffi and cupy to provide bit operations for PyTorch tensors.

Using bit operations in PyTorch, we can easily evaluate the impact of bit errors on quantized networks. Even small bit error rates reduce accuracy quite significantly — especially for models with BN where accuracy quickly reaches chance level. GN-based models, in contrast are more robust. Note that this already includes some of our more robust fixed-point quantization:

Test Error in %
Model No bit errors 0.1% bit errors 1% bit errors
SimpleNet 4-bit GN5.77.6244.17
SimpleNet 8-bit GN4.786.2234.5
More details can be found in Section 4.1 of [2] as well as Section 5.5.

A simple but extremely effective regularization scheme to improve bit error robustness is weight clippin: during training, weights are constrained to stay within a limited range $[-w_{\text{max}},w_{\text{max}}]$. While it seems that this improves robustness because it reduces the range of bit errors, as well, it actually turns out that this regularization prefers more "distributed" weights. That is, more weights contribute to the model predictions and this improves robustness to bit errors. Here are some results for the above models:

Test Error in %
Model No bit errors 0.1% bit errors 1% bit errors
SimpleNet 4-bit GN5.77.6244.17
SimpleNet 4-bit GN with $w_{\text{max}} = 0.1$5.856.5310.56
SimpleNet 8-bit GN4.786.2234.5%
SimpleNet 8-bit GN with $w_{\text{max}} = 0.1$5.726.4310.92
More results can be found in Section 5.5 of [2].

On top of weight clipping, it is also effective to inject bit errors during training to improve robustness. Results for this so-called random bit error training with bit error rate $p$ are included below:

Test Error in %
Model No bit errors 0.1% bit errors 1% bit errors
SimpleNet 8-bit GN4.786.2234.5%
SimpleNet 8-bit GN with $w_{\text{max}} = 0.1$5.726.4310.92
SimpleNet 8-bit GN with $w_{\text{max}} = 0.1$ and $p = 0.1\%$5.346.079.33
SimpleNet 8-bit GN with $w_{\text{max}} = 0.1$ and $p = 1\%$5.676.278.53
More results can be found in Section 5.5 of [2].
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.