Mixed precision training

Mixed precision is when a model is trained using precision operations to help it run quicker and consume less memory. On contemporary GPUs, mixed-precision may enhance performance by x3, while on TPUs, it can improve performance by 60%.

The float32 type, which uses 32 bits, is now used by the majority of models. However, two precision types, bfloat16 and float16, each need 16 bits. Because they have dedicated hardware to do 16-bit calculations and 16-bit types can be read from memory quicker. As a result, on certain devices, these precision types should be utilized wherever feasible.

To maintain numerical stability, variables (other sensitive operations) still need to be stored as float32. The model will run quicker when training and when utilizing 32-bit.

This approach is known as mixed-precision training since it uses both FP16 and FP32.

Loss Scaling

Despite mixed-precision training handling the issue of keeping accuracy, for the most part, studies revealed that tiny gradient values might arise even before the learning rate was increased.

Underflow is an issue in which gradients are equal to zero owing to precision restrictions.

As a result, they propose loss scaling, which entails multiplying the loss value by a scale factor after the forward pass and before back-propagation. According to the chain rule, all gradients are then scaled by the same factor, bringing them inside the range of FP16.

After calculating the gradients, split them by the same scale factor before using them to update the master weights in FP32.

There is no disadvantage to using a big scaling factor in theory unless it is large enough to cause overflow.

When the gradients, multiplied by the scaling factor, reach the maximum limit for FP16, overflow occurs. The gradient becomes infinite and is set to NaN when this happens.

The step is missed in this situation since an infinite gradient cannot be used to calculate the weight update, and the loss scale is lowered for future iterations.

Automatic Precision

In 2018, NVIDIA launched Apex, a PyTorch addon that included AMP (Automatic Mixed Precision) functionality. This simplified the process of applying mixed-precision training in PyTorch.

Training may be switched from FP32 to mixed precision on the GPU with just a few lines of code. This has two major advantages:

  • Shortened training time – model performance was not affected when training duration was reduced by 1.5x to 5.5x.
  • Memory requirements were reduced, allowing other model aspects such as architectural size, batch size, and input data size to be increased.

NVIDIA and Facebook put this capability into the main PyTorch code as of PyTorch 1.6. This resolved a number of issues with the Apex package, including version compatibility and construction challenges.


Although floating-point accuracy is sometimes underestimated, it is critical in the training of deep learning models, where tiny gradients and learning rates combine to produce gradient updates that require more bits to be correctly represented.

However, when state-of-the-art deep learning models push the limits of task performance, architectures expand, and precision must be weighed against training time, memory needs, and available computing.

As a result, mixed-precision training’s ability to retain performance while halving memory use is a significant advancement in deep learning.