Keil Logo

Technical Support

On-Line Manuals

Libraries and Floating Point Support Guide

Preface The ARM C and C++ Libraries The ARM C Micro-library Floating-point Support About floating-point support The software floating-point library, fplib Calling fplib routines fplib arithmetic on numbers in a particular format fplib conversions between floats, long longs, doub fplib comparisons between floats and doubles fplib C99 functions Controlling the ARM floating-point environment Floating-point functions for compatibility with Mi C99-compatible functions for controlling the ARM f C99 rounding mode and floating-point exception mac Exception flag handling Functions for handling rounding modes Functions for saving and restoring the whole float Functions for temporarily disabling exceptions ARM floating-point compiler extensions to the C99 Writing a custom exception trap handler Example of a custom exception handler Exception trap handling by signals mathlib double and single-precision floating-point IEEE 754 arithmetic Basic data types for IEEE 754 arithmetic Single precision data type for IEEE 754 arithmetic Double precision data type for IEEE 754 arithmetic Sample single precision floating-point values for Sample double precision floating-point values for IEEE 754 arithmetic and rounding Exceptions arising from IEEE 754 floating-point ar Exception types recognized by the ARM floating-poi Using the Vector Floating-Point (VFP) support libr The C and C++ Library Functions reference Floating-point Support Functions Reference

IEEE 754 arithmetic and rounding

3.5.6 IEEE 754 arithmetic and rounding

IEEE 754 defines different rounding rules to use when calculating arithmetic results.

Arithmetic is generally performed by computing the result of an operation as if it were stored exactly (to infinite precision), and then rounding it to fit in the format. Apart from operations whose result already fits exactly into the format (such as adding 1.0 to 1.0), the correct answer is generally somewhere between two representable numbers in the format. The system then chooses one of these two numbers as the rounded result. It uses one of the following methods:
Round to nearest
The system chooses the nearer of the two possible outputs. If the correct answer is exactly halfway between the two, the system chooses the output where the least significant bit of Frac is zero. This behavior (round-to-even) prevents various undesirable effects.
This is the default mode when an application starts up. It is the only mode supported by the ordinary floating-point libraries. Hardware floating-point environments and the enhanced floating-point libraries support all four rounding modes.
Round up, or round toward plus infinity
The system chooses the larger of the two possible outputs (that is, the one further from zero if they are positive, and the one closer to zero if they are negative).
Round down, or round toward minus infinity
The system chooses the smaller of the two possible outputs (that is, the one closer to zero if they are positive, and the one further from zero if they are negative).
Round toward zero, or chop, or truncate
The system chooses the output that is closer to zero, in all cases.
Non-ConfidentialPDF file icon PDF versionARM DUI0378H
Copyright © 2007, 2008, 2011, 2012, 2014-2016 ARM. All rights reserved. 
  Arm logo
Important information

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.

Change Settings

Privacy Policy Update

Arm’s Privacy Policy has been updated. By continuing to use our site, you consent to Arm’s Privacy Policy. Please review our Privacy Policy to learn more about our collection, use and transfers
of your data.