Keil Logo

double Format

A floating-point number is expressed as the product of three parts: the sign, the mantissa, and an exponent. For example:

double value = sign × 1.mantissa × 2exponent-bias

Where

sign Represents the sign of the floating-pointer number (+ or -).
mantissa Represents the actual binary digits of the floating-point number. It is a 53-bit value (representing about thirteen decimal digits) whose most significant bit (MSB) is always 1 and is, therefore, not stored.
exponent Is an 11-bit value from 0 to 2047. The actual value of the exponent is calculated by subtracting the bias (1023) from the stored value (0 to 2047) giving a range of –1023 to +1024.
bias Is a constant value (1023) subtracted from the stored exponent to obtain the actual exponent.

Floating-point numbers are stored in normalized form which maximizes the quantity of numbers that can be represented. Normalized numbers have a binary point ('.') after the first non-zero digit. This is how the mantissa is able to hold 53 binary digits in only 52 bits.

Denormalized floating-point numbers are used to represent values smaller than what can be represented by normalized values. The drawback is that the precision decreases with smaller values. Denormalized floating-point values are represented as follows:

double value = sign × 0.mantissa × 2-1022

Where

sign Represents the sign of the floating-pointer number (+ or -).
mantissa Represents the actual binary digits of the floating-point number. It is a 52-bit value whose most significant bit (MSB) is always 0 and is not stored.
-1022 Is the fixed value of the exponent.

If the stored exponent is zero and the mantissa is non-zero the floating-point value is a denormalized number. For denormalized numbers, the exponent is treated as if a 1 were stored. Hence, the actual exponent is -1022 (1 minus 1023).

Double-precision floating-point numbers are stored using the following 64-bit format:

Bits 63-56 55-48 47-40 39-32
Contents SEEE EEEE EEEE MMMM MMMM MMMM MMMM MMMM
Bits 31-24 23-16 15-8 7-0
Contents MMMM MMMM MMMM MMMM MMMM MMMM MMMM MMMM

Where

S represents the sign bit where 1 is negative and 0 is positive.
E is the exponent with a bias of 1023.
M is the 53-bit mantissa (stored in 52 bits).

Using the above format, the floating-point number -12.5 is stored as a hexadecimal value of 0xC029000000000000. In memory, this value appears as follows:

Bits 63-56 55-48 47-40 39-32
Contents 0xC0 0x29 0x00 0x00
Bits 31-24 23-16 15-8 7-0
Contents 0x00 0x00 0x00 0x00

It is fairly simple to convert floating-point numbers to and from their hexadecimal storage equivalents. The following example demonstrates how this is done for the value -12.5 shown above.

The floating-point storage representation is not an intuitive format. To convert this to a floating-point number, the bits must be separated as specified in the floating-point number storage format table shown above. For example:

Bits 63-56 55-48 47-40 39-32
Format SEEE EEEE EEEE MMMM MMMM MMMM MMMM MMMM
Binary 1100 0000 0010 1001 0000 0000 0000 0000
Hex C 0 2 9 0 0 0 0
Bits 31-24 23-16 15-8 7-0
Format MMMM MMMM MMMM MMMM MMMM MMMM MMMM MMMM
Binary 0000 0000 0000 0000 0000 0000 0000 0000
Hex 0 0 0 0 0 0 0 0

From this illustration, you can determine the following:

  • The sign bit is 1, indicating a negative number.
  • The exponent value is 100 0000 0010 binary or 1026 decimal. Subtracting 1023 from 1026 leaves 3, which is the actual exponent.
  • The mantissa appears as the following binary number:
    1001 00000000 00000000 00000000 00000000 00000000 00000000
    

There is an understood binary point at the left of the mantissa that is always preceded by a 1. This digit is omitted from the stored form of the floating-point number. Adding 1 and the binary point to the beginning of the mantissa gives the following value:

1.1001000000000000000000000000000000000000000000000000

To adjust the mantissa for the exponent, move the decimal point to the left for negative exponent values or right for positive exponent values. Since the exponent is three, the mantissa is adjusted as follows:

1100.1000000000000000000000000000000000000000000000000

The result is a binary floating-point number. Binary digits to the left of the decimal point represent the power of two corresponding to their position. For example, 1100 represents (1 × 23) + (1 × 22) + (0 × 21) + (0 × 20), which is 12.

Binary digits to the right of the decimal point also represent the power of two corresponding to their position. However, the powers are negative. For example, .100... represents (1 × 2-1) + (0 × 2-2) + (0 × 2-3) + ... which equals .5.

The sum of these values is 12.5. Because the sign bit was set, this number should be negative.

So, the hexadecimal value 0xC029000000000000 is -12.5.

  Arm logo
Important information

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.

Change Settings

Privacy Policy Update

Arm’s Privacy Policy has been updated. By continuing to use our site, you consent to Arm’s Privacy Policy. Please review our Privacy Policy to learn more about our collection, use and transfers
of your data.