Keil™, An ARM® Company

CARM User's Guide

Discontinued

double Format

A floating-point number is expressed as the product of three parts: the sign, the mantissa, and an exponent. For example:

double value = sign × 1.mantissa × 2exponent-bias

Where

signRepresents the sign of the floating-pointer number (+ or -).
mantissaRepresents the actual binary digits of the floating-point number. It is a 53-bit value (representing about thirteen decimal digits) whose most significant bit (MSB) is always 1 and is, therefore, not stored.
exponentIs an 11-bit value from 0 to 2047. The actual value of the exponent is calculated by subtracting the bias (1023) from the stored value (0 to 2047) giving a range of –1023 to +1024.
biasIs a constant value (1023) subtracted from the stored exponent to obtain the actual exponent.

Floating-point numbers are stored in normalized form which maximizes the quantity of numbers that can be represented. Normalized numbers have a binary point ('.') after the first non-zero digit. This is how the mantissa is able to hold 53 binary digits in only 52 bits.

Denormalized floating-point numbers are used to represent values smaller than what can be represented by normalized values. The drawback is that the precision decreases with smaller values. Denormalized floating-point values are represented as follows:

double value = sign × 0.mantissa × 2-1022

Where

signRepresents the sign of the floating-pointer number (+ or -).
mantissaRepresents the actual binary digits of the floating-point number. It is a 52-bit value whose most significant bit (MSB) is always 0 and is not stored.
-1022Is the fixed value of the exponent.

If the stored exponent is zero and the mantissa is non-zero the floating-point value is a denormalized number. For denormalized numbers, the exponent is treated as if a 1 were stored. Hence, the actual exponent is -1022 (1 minus 1023).

Double-precision floating-point numbers are stored using the following 64-bit format:

Bits63-5655-4847-4039-32
ContentsSEEE EEEEEEEE MMMMMMMM MMMMMMMM MMMM
Bits31-2423-1615-87-0
ContentsMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMM

Where

Srepresents the sign bit where 1 is negative and 0 is positive.
Eis the exponent with a bias of 1023.
Mis the 53-bit mantissa (stored in 52 bits).

Using the above format, the floating-point number -12.5 is stored as a hexadecimal value of 0xC029000000000000. In memory, this value appears as follows:

Bits63-5655-4847-4039-32
Contents0xC00x290x000x00
Bits31-2423-1615-87-0
Contents0x000x000x000x00

It is fairly simple to convert floating-point numbers to and from their hexadecimal storage equivalents. The following example demonstrates how this is done for the value -12.5 shown above.

The floating-point storage representation is not an intuitive format. To convert this to a floating-point number, the bits must be separated as specified in the floating-point number storage format table shown above. For example:

Bits63-5655-4847-4039-32
FormatSEEE EEEEEEEE MMMMMMMM MMMMMMMM MMMM
Binary1100 00000010 10010000 00000000 0000
HexC 02 90 00 0
Bits31-2423-1615-87-0
FormatMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMM
Binary0000 00000000 00000000 00000000 0000
Hex0 00 00 00 0

From this illustration, you can determine the following:

  • The sign bit is 1, indicating a negative number.
  • The exponent value is 100 0000 0010 binary or 1026 decimal. Subtracting 1023 from 1026 leaves 3, which is the actual exponent.
  • The mantissa appears as the following binary number:
    1001 00000000 00000000 00000000 00000000 00000000 00000000
    

There is an understood binary point at the left of the mantissa that is always preceded by a 1. This digit is omitted from the stored form of the floating-point number. Adding 1 and the binary point to the beginning of the mantissa gives the following value:

1.1001000000000000000000000000000000000000000000000000

To adjust the mantissa for the exponent, move the decimal point to the left for negative exponent values or right for positive exponent values. Since the exponent is three, the mantissa is adjusted as follows:

1100.1000000000000000000000000000000000000000000000000

The result is a binary floating-point number. Binary digits to the left of the decimal point represent the power of two corresponding to their position. For example, 1100 represents (1 × 23) + (1 × 22) + (0 × 21) + (0 × 20), which is 12.

Binary digits to the right of the decimal point also represent the power of two corresponding to their position. However, the powers are negative. For example, .100... represents (1 × 2-1) + (0 × 2-2) + (0 × 2-3) + ... which equals .5.

The sum of these values is 12.5. Because the sign bit was set, this number should be negative.

So, the hexadecimal value 0xC029000000000000 is -12.5.