Keil Logo

Half-precision floating-point data types

6.4 Half-precision floating-point data types

Use the _Float16 data type for 16-bit floating-point values in your C and C++ source files.

Arm® Compiler 6 supports two half-precision (16-bit) floating-point scalar data types:

  • The IEEE 754-2008 __fp16 data type, defined in the Arm C Language Extensions.
  • The _Float16 data type, defined in the C11 extension ISO/IEC TS 18661-3:2015

The __fp16 data type is not an arithmetic data type. The __fp16 data type is for storage and conversion only. Operations on __fp16 values do not use half-precision arithmetic. The values of __fp16 automatically promote to single-precision float (or double-precision double) floating-point data type when used in arithmetic operations. After the arithmetic operation, these values are automatically converted to the half-precision __fp16 data type for storage. The __fp16 data type is available in both C and C++ source language modes.

The _Float16 data type is an arithmetic data type. Operations on _Float16 values use half-precision arithmetic. The _Float16 data type is available in both C and C++ source language modes.

Arm recommends that for new code, you use the _Float16 data type instead of the __fp16 data type. __fp16 is an Arm C Language Extension and therefore requires compliance with the ACLE. _Float16 is defined by the C standards committee, and therefore using _Float16 does not prevent code from being ported to architectures other than Arm. Also, _Float16 arithmetic operations directly map to Armv8.2-A half-precision floating-point instructions when they are enabled on Armv8.2-A and later architectures. This avoids the need for conversions to and from single-precision floating-point, and therefore results in more performant code. If the Armv8.2-A half-precision floating-point instructions are not available, _Float16 values are automatically promoted to single-precision, similar to the semantics of __fp16 except that the results continue to be stored in single-precision floating-point format instead of being converted back to half-precision floating-point format.

To define a _Float16 literal, append the suffix f16 to the compile-time constant declaration. There is no implicit argument conversion between _Float16 and standard floating-point data types. Therefore, an explicit cast is required for promoting _Float16 to a single-precision floating-point format, for argument passing.

extern void ReadFloatValue(float f);

void ReadValues(void)
{
    // Half-precision floating-point value stored in the _Float16 data type.
    const _Float16 h = 1.0f16; 

 
    // There is no implicit argument conversion between _Float16 and standard floating-point data types.
    // Therefore, this call to the ReadFloatValue() function below is not a call to the declared function extern void ReadFloatValue(float f).
    ReadFloatValue(h);

    // An explicit cast is required for promoting a _Float16 value to a single-precision floating-point value.
    // Therefore, this call to the ReadFloatValue() function below is a call to the declared function extern void ReadFloatValue(float f).
    ReadFloatValue((float)h);

    return;
}

In an arithmetic operation where one operand is of __fp16 data type and the other is of _Float16 data type, the _Float16 value is first converted to __fp16 value and then the operation is completed as if both operands were of __fp16 data type.

void AddValues(_Float16 a, __fp16 b)
{
    _Float16 c;
    __fp16 d;

    // This addition is evaluated in 16-bit half-precision arithmetic. 
    // The result is stored in 16 bits using the _Float16 data type.
    c = a+a;

    // This addition is evaluated in 32-bit single-precision arithmetic. 
    // The result is stored in 16 bits using the __fp16 data type.
    d = b+b; 

    // The value in variable 'a' in this addition is converted to a __fp16 value. 
    // And then the addition is evaluated in 32-bit single-precision arithmetic. 
    // The result is stored in 16 bits using the __fp16 data type.
    d = a+b;

    return;
}

To generate Armv8.2 half-precision floating-point instructions using armclang, you must use the +fp16 architecture extension, for example:

armclang --target=aarch64-arm-none-eabi -march=armv8.2-a+fp16
armclang --target=aarch64-arm-none-eabi -mcpu=cortex-a75+fp16
armclang --target=arm-arm-none-eabi -march=armv8.2-a+fp16
armclang --target=arm-arm-none-eabi -mcpu=cortex-a75+fp16
Non-ConfidentialPDF file icon PDF version100067_0612_00_en
Copyright © 2014–2019 Arm Limited or its affiliates. All rights reserved. 
  Arm logo
Important information

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.

Change Settings

Privacy Policy Update

Arm’s Privacy Policy has been updated. By continuing to use our site, you consent to Arm’s Privacy Policy. Please review our Privacy Policy to learn more about our collection, use and transfers
of your data.