A recent GitHub update for AMD’s open-source ROCm software suggests that future AMD GPUs might support the increasingly popular BFloat16 numeric format for deep learning training, following in the footsteps of Google, Intel, and Arm.
The update on GitHub was a commit in the ROCm Software Platform Repository, AMD’s open-source HPC platform for GPU computing, titled “more BF16 TN sizes.” The reference to BF16, short for BFloat16 or bfloat16, suggests that AMD might implement it in hardware in a future GPU architecture.
BFloat16 is a recent numeric data format developed by Google for deep learning training and implemented in its TPUs. It truncates the mantissa of a standard FP32 floating-point number by 16 bits, essentially reducing its precision by multiple decimals. It requires less silicon area and bandwidth by being a 16-bit format, and Google also claims it is more efficient to implement in hardware than the currently standardized FP16 format.
The BF16 format is increasingly supported in hardware. Intel has already announced that it would broadly adopt the format, supporting it in the upcoming Cooper Lake-SP Xeon, Agilex FPGA, and Nervana NNP-T. More recently, ARM announced support for BF16 coming to Armv8. With AMD now possibly joining the party, that would leave Nvidia as the only major AI hardware vendor without a public commitment to support the format.