

Technical Support OnLine Manuals CARM User's Guide CARM Introduction Compiling Programs Language Extensions Preprocessor Advanced Programming Customization Files Startup Code Basic I/O Memory Allocation Interfacing C to Assembler Function Parameters Function Return Values Using the SRC Directive Register Usage Example Routines ARM Code Example Thumb Code Example Data Storage Formats Byte Order Byte Alignment Integer Scalars 1Byte Scalars 2Byte Scalars 4Byte Scalars 8Byte Scalars float Scalars float Format float Errors double Scalars double Format double Errors Pointers Bit Fields Absolute Memory Access The __at Keyword Linker Location Controls Error Messages Library Reference Appendix 
double FormatA floatingpoint number is expressed as the product of three parts: the sign, the mantissa, and an exponent. For example: double value = sign × 1.mantissa × 2^{exponentbias} Where
Floatingpoint numbers are stored in normalized form which maximizes the quantity of numbers that can be represented. Normalized numbers have a binary point ('.') after the first nonzero digit. This is how the mantissa is able to hold 53 binary digits in only 52 bits. Denormalized floatingpoint numbers are used to represent values smaller than what can be represented by normalized values. The drawback is that the precision decreases with smaller values. Denormalized floatingpoint values are represented as follows: double value = sign × 0.mantissa × 2^{1022} Where
If the stored exponent is zero and the mantissa is nonzero the floatingpoint value is a denormalized number. For denormalized numbers, the exponent is treated as if a 1 were stored. Hence, the actual exponent is 1022 (1 minus 1023). Doubleprecision floatingpoint numbers are stored using the following 64bit format:
Where
Using the above format, the floatingpoint number 12.5 is stored as a hexadecimal value of 0xC029000000000000. In memory, this value appears as follows:
It is fairly simple to convert floatingpoint numbers to and from their hexadecimal storage equivalents. The following example demonstrates how this is done for the value 12.5 shown above. The floatingpoint storage representation is not an intuitive format. To convert this to a floatingpoint number, the bits must be separated as specified in the floatingpoint number storage format table shown above. For example:
From this illustration, you can determine the following:
There is an understood binary point at the left of the mantissa that is always preceded by a 1. This digit is omitted from the stored form of the floatingpoint number. Adding 1 and the binary point to the beginning of the mantissa gives the following value: 1.1001000000000000000000000000000000000000000000000000 To adjust the mantissa for the exponent, move the decimal point to the left for negative exponent values or right for positive exponent values. Since the exponent is three, the mantissa is adjusted as follows: 1100.1000000000000000000000000000000000000000000000000 The result is a binary floatingpoint number. Binary digits to the left of the decimal point represent the power of two corresponding to their position. For example, 1100 represents (1 × 2^{3}) + (1 × 2^{2}) + (0 × 2^{1}) + (0 × 2^{0}), which is 12. Binary digits to the right of the decimal point also represent the power of two corresponding to their position. However, the powers are negative. For example, .100... represents (1 × 2^{1}) + (0 × 2^{2}) + (0 × 2^{3}) + ... which equals .5. The sum of these values is 12.5. Because the sign bit was set, this number should be negative. So, the hexadecimal value 0xC029000000000000 is 12.5.  

Arm’s Privacy Policy has been updated. By continuing to use our site, you consent to Arm’s Privacy Policy. Please review our Privacy Policy to learn more about our collection, use and transfers
of your data.