CARM User's GuideCARM Introduction Compiling Programs Language Extensions Preprocessor Advanced Programming Customization Files Startup Code Basic I/O Memory Allocation Interfacing C to Assembler Function Parameters Function Return Values Using the SRC Directive Register Usage Example Routines ARM Code Example Thumb Code Example Data Storage Formats Byte Order Byte Alignment Integer Scalars 1-Byte Scalars 2-Byte Scalars 4-Byte Scalars 8-Byte Scalars float Scalars float Format float Errors double Scalars double Format double Errors Pointers Bit Fields Absolute Memory Access The __at Keyword Linker Location Controls Error Messages Library Reference Appendix
CARM User's Guide
A floating-point number is expressed as the product of three parts: the sign, the mantissa, and an exponent. For example:
double value = sign × 1.mantissa × 2exponent-bias
Floating-point numbers are stored in normalized form which maximizes the quantity of numbers that can be represented. Normalized numbers have a binary point ('.') after the first non-zero digit. This is how the mantissa is able to hold 53 binary digits in only 52 bits.
Denormalized floating-point numbers are used to represent values smaller than what can be represented by normalized values. The drawback is that the precision decreases with smaller values. Denormalized floating-point values are represented as follows:
double value = sign × 0.mantissa × 2-1022
If the stored exponent is zero and the mantissa is non-zero the floating-point value is a denormalized number. For denormalized numbers, the exponent is treated as if a 1 were stored. Hence, the actual exponent is -1022 (1 minus 1023).
Double-precision floating-point numbers are stored using the following 64-bit format:
Using the above format, the floating-point number -12.5 is stored as a hexadecimal value of 0xC029000000000000. In memory, this value appears as follows:
It is fairly simple to convert floating-point numbers to and from their hexadecimal storage equivalents. The following example demonstrates how this is done for the value -12.5 shown above.
The floating-point storage representation is not an intuitive format. To convert this to a floating-point number, the bits must be separated as specified in the floating-point number storage format table shown above. For example:
From this illustration, you can determine the following:
There is an understood binary point at the left of the mantissa that is always preceded by a 1. This digit is omitted from the stored form of the floating-point number. Adding 1 and the binary point to the beginning of the mantissa gives the following value:
To adjust the mantissa for the exponent, move the decimal point to the left for negative exponent values or right for positive exponent values. Since the exponent is three, the mantissa is adjusted as follows:
The result is a binary floating-point number. Binary digits to the left of the decimal point represent the power of two corresponding to their position. For example, 1100 represents (1 × 23) + (1 × 22) + (0 × 21) + (0 × 20), which is 12.
Binary digits to the right of the decimal point also represent the power of two corresponding to their position. However, the powers are negative. For example, .100... represents (1 × 2-1) + (0 × 2-2) + (0 × 2-3) + ... which equals .5.
The sum of these values is 12.5. Because the sign bit was set, this number should be negative.
So, the hexadecimal value 0xC029000000000000 is -12.5.