Compiler User GuideConventions and Feedback Overview of the Compiler Getting Started with the Compiler Compiler Features Compiler Coding Practices The compiler as an optimizing compiler Compiler optimization for code size versus speed Compiler optimization levels and the debug view Selecting the target CPU at compile time Optimization of loop termination in C code Loop unrolling in C code Compiler optimization and the volatile keyword Code metrics Code metrics for measurement of code size and data Stack use in C and C++ Benefits of reducing debug information in objects Methods of reducing debug information in objects a Guarding against multiple inclusion of header file Methods of minimizing function parameter passing o Functions that return multiple values through regi Functions that return the same result when called Comparison of pure and impure functions Recommendation of postfix syntax when qualifying f Inline functions Compiler decisions on function inlining Automatic function inlining and static functions Inline functions and removal of unused out-of-line Automatic function inlining and multifile compilat Restriction on overriding compiler decisions about Compiler modes and inline functions Inline functions in C++ and C90 mode Inline functions in C99 mode Inline functions and debugging Types of data alignment Advantages of natural data alignment Compiler storage of data objects by natural byte a Relevance of natural data alignment at compile tim Unaligned data access in C and C++ code The __packed qualifier and unaligned data access i Unaligned fields in structures Performance penalty associated with marking whole Unaligned pointers in C and C++ code Unaligned Load Register (LDR) instructions generat Comparisons of an unpacked struct, a __packed stru Compiler support for floating-point arithmetic Default selection of hardware or software floating Example of hardware and software support differenc Vector Floating-Point (VFP) architectures Limitations on hardware handling of floating-point Implementation of Vector Floating-Point (VFP) supp Compiler and library support for half-precision fl Half-precision floating-point number format Compiler support for floating-point computations a Types of floating-point linkage Compiler options for floating-point linkage and co Floating-point linkage and computational requireme Processors and their implicit Floating-Point Units Integer division-by-zero errors in C code About trapping integer division-by-zero errors wit About trapping integer division-by-zero errors wit Identification of integer division-by-zero errors Examining parameters when integer division-by-zero Software floating-point division-by-zero errors in About trapping software floating-point division-by Identification of software floating-point division Software floating-point division-by-zero debugging New language features of C99 New library features of C99 // comments in C99 and C90 Compound literals in C99 Designated initializers in C99 Hexadecimal floating-point numbers in C99 Flexible array members in C99 __func__ predefined identifier in C99 inline functions in C99 long long data type in C99 and C90 Macros with a variable number of arguments in C99 Mixed declarations and statements in C99 New block scopes for selection and iteration state _Pragma preprocessing operator in C99 Restricted pointers in C99 Additional <math.h> library functions in C99 Complex numbers in C99 Boolean type and <stdbool.h> in C99 Extended integer types and functions in <inttyp <fenv.h> floating-point environment access i <stdio.h> snprintf family of functions in C9 <tgmath.h> type-generic math macros in C99 <wchar.h> wide character I/O functions in C9 How to prevent uninitialized data from being initi Compiler Diagnostic Messages Using the Inline and Embedded Assemblers of the AR
Loop unrolling in C code
Loops are a common construct in most programs. Because a significant amount of execution time is often spent in loops, it is worthwhile paying attention to time-critical loops.
Small loops can be unrolled for higher performance, with the
disadvantage of increased code size. When a loop is unrolled, a
loop counter needs to be updated less often and fewer branches are
executed. If the loop iterates only a few times, it can be fully
unrolled so that the loop overhead completely disappears. The compiler
unrolls loops automatically at
Manual unrolling of loops might hinder the automatic re-rolling of loops and other loop optimizations by the compiler.
The advantages and disadvantages of loop unrolling can be illustrated using the two sample routines shown in Table 5. Both routines efficiently test a single bit by extracting the lowest bit and counting it, after which the bit is shifted out.
The first implementation uses a loop to count bits. The second
routine is the first implementation unrolled four times, with an
optimization applied by combining the four shifts of
Unrolling frequently provides new opportunities for optimization.
Table 5. C code for rolled and unrolled bit-counting loops
Table 6 shows
the corresponding disassembly of the machine code produced by the
compiler for each of the sample implementations of Table 5, where the
C code for each implementation has been compiled using the option
Table 6. Disassembly for rolled and unrolled bit-counting loops
On the ARM9, checking a single bit takes six cycles in the disassembly of the bit-counting loop shown in the leftmost column. The code size is only nine instructions. The unrolled version of the bit-counting loop checks four bits at a time per loop iteration, taking on average only three cycles per bit. However, the cost is the larger code size of fifteen instructions.