Keil Logo Arm Logo

Technical Support

On-Line Manuals

Libraries and Floating Point Support Guide

Conventions and feedback The ARM C and C++ libraries The ARM C micro-library Floating-point support About floating-point support The software floating-point library, fplib Calling fplib routines fplib arithmetic on numbers in a particular format fplib conversions between floats, doubles, and int fplib conversion between long longs, floats, and d fplib comparisons between floats and doubles fplib C99 functions Controlling the ARM floating-point environment Floating-point functions for compatibility with Mi C99-compatible functions for controlling the ARM f C99 rounding mode and floating-point exception mac Exception flag handling Functions for handling rounding modes Functions for saving and restoring the whole float Functions for temporarily disabling exceptions ARM floating-point compiler extensions to the C99 Writing a custom exception trap handler Example of a custom exception handler Exception trap handling by signals Using C99 signalling NaNs provided by mathlib (_WA mathlib double and single-precision floating-point Nonstandard functions in mathlib IEEE 754 arithmetic Basic data types for IEEE 754 arithmetic Single precision data type for IEEE 754 arithmetic Double precision data type for IEEE 754 arithmetic Sample single precision floating-point values for Sample double precision floating-point values for IEEE 754 arithmetic and rounding Exceptions arising from IEEE 754 floating-point ar Ignoring exceptions from IEEE 754 floating-point a Trapping exceptions from IEEE 754 floating-point a Exception types recognized by the ARM floating-poi Using the Vector Floating-Point (VFP) support libr

Libraries and Floating Point Support Guide

IEEE 754 arithmetic and rounding

IEEE 754 arithmetic and rounding

Arithmetic is generally performed by computing the result of an operation as if it were stored exactly (to infinite precision), and then rounding it to fit in the format. Apart from operations whose result already fits exactly into the format (such as adding 1.0 to 1.0), the correct answer is generally somewhere between two representable numbers in the format. The system then chooses one of these two numbers as the rounded result. It uses one of the following methods:

Round to nearest

The system chooses the nearer of the two possible outputs. If the correct answer is exactly halfway between the two, the system chooses the output where the least significant bit of Frac is zero. This behavior (round-to-even) prevents various undesirable effects.

This is the default mode when an application starts up. It is the only mode supported by the ordinary floating-point libraries. Hardware floating-point environments and the enhanced floating-point libraries support all four rounding modes.

Round up, or round toward plus infinity

The system chooses the larger of the two possible outputs (that is, the one further from zero if they are positive, and the one closer to zero if they are negative).

Round down, or round toward minus infinity

The system chooses the smaller of the two possible outputs (that is, the one closer to zero if they are positive, and the one further from zero if they are negative).

Round toward zero, or chop, or truncate

The system chooses the output that is closer to zero, in all cases.

Show/hideSee also

Concepts
Reference
Other information
Copyright © 2007-2008, 2011-2012 ARM. All rights reserved.ARM DUI 0378D
Non-ConfidentialID062912

arm-logo-small

Keil logo
Important information

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.