| |||||||||||||||||||||||||||||
On-Line Manuals RealView Compiler User's Guide | Optimizing loops
Loops are a common construct in most programs. Because a significant amount of execution time is often spent in loops, it is worthwhile paying attention to time-critical loops. The loop termination condition can cause significant overhead if written without caution. Where possible:
Table 4.1 shows two sample implementations of a routine to calculate Table 4.1. C code for incrementing and decrementing loops
Table 4.2 shows the corresponding disassembly of the machine code produced by the compiler for each of the sample implementations of Table 4.1, where the C code for both implementations has been compiled using the options Table 4.2. C Disassembly for incrementing and decrementing loops
Comparing the disassemblies of Table 4.2 shows that the In addition to saving an instruction in the loop, the variable The technique of initializing the loop counter to the number of iterations required, and then decrementing down to zero, also applies to while and do statements. Small loops can be unrolled for higher performance, with the disadvantage of increased code size. When a loop is unrolled, a loop counter needs to be updated less often and fewer branches are executed. If the loop iterates only a few times, it can be fully unrolled, so that the loop overhead completely disappears. The ARM compiler unrolls loops automatically at NoteManual unrolling of loops might hinder the automatic re-rolling of loops and other loop optimizations by the compiler. The advantages and disadvantages of loop unrolling can be illustrated using the two sample routines shown in Table 4.3. Both routines efficiently test a single bit by extracting the lowest bit and counting it, after which the bit is shifted out. The first implementation uses a loop to count bits. The second routine is the first unrolled four times, with an optimization applied by combining the four shifts of Table 4.3. C code for rolled and unrolled bit‑counting loops
Table 4.4 shows the corresponding disassembly of the machine code produced by the compiler for each of the sample implementations of Table 4.3, where the C code for each implementation has been compiled using the option Table 4.4. Disassembly for rolled and unrolled bit‑counting loops
On the ARM7, checking a single bit takes six cycles in the disassembly of the bit‑counting loop shown in the leftmost column. The code size is only nine instructions. The unrolled version of the bit‑counting loop checks four bits at a time, taking on average only three cycles per bit. However, the cost is the larger code size of fifteen instructions. | ||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||