|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Technical Support On-Line Manuals Assembler User Guide |
Assembler User GuideIllustration of the benefits of using conditional instructions
This illustrates the difference between using branches and using conditional instructions. It uses the Euclid algorithm for the Greatest Common Divisor (gcd) to demonstrate how conditional instructions improve code size and speed. In C the gcd algorithm can be expressed as:
int gcd(int a, int b)
{
while (a != b)
{
if (a > b)
a = a - b;
else
b = b - a;
}
return a;
}
The following examples show implementations of the gcd algorithm with and without conditional instructions. NoteThe detailed analysis of execution speed only applies to an ARM7™ processor. The code density calculations apply to all ARM processors. This is an ARM code implementation of the gcd algorithm using branches, without using any other conditional instructions. Conditional execution is achieved by using conditional branches, rather than individual conditional instructions:
gcd CMP r0, r1
BEQ end
BLT less
SUBS r0, r0, r1 ; could be SUB r0, r0, r1 for ARM
B gcd
less
SUBS r1, r1, r0 ; could be SUB r1, r1, r0 for ARM
B gcd
end
The code is seven instructions long because of the number of branches. Every time a branch is taken, the processor must refill the pipeline and continue from the new location. The other instructions and non-executed branches use a single cycle each. The following table shows the number of cycles this implementation uses on an ARM7 processor when R0 equals 1 and R1 equals 2. Table 17. Conditional branches only
This is an ARM code implementation of the gcd algorithm using individual conditional instructions in ARM code. The gcd algorithm only takes four instructions:
gcd
CMP r0, r1
SUBGT r0, r0, r1
SUBLE r1, r1, r0
BNE gcd
In addition to improving code size, in most cases this code executes faster than the version that uses only branches. The following table shows the number of cycles this implementation uses on an ARM7 processor when R0 equals 1 and R1 equals 2. Table 18. All instructions conditional
Comparing this with the example that uses only branches:
In architectures ARMv6T2 and later, you can use the
gcd
CMP r0, r1
ITE GT
SUBGT r0, r0, r1
SUBLE r1, r1, r0
BNE gcd
This assembles equally well to ARM or Thumb code. The assembler
checks the It requires one more instruction in Thumb code (the In architectures before ARMv6T2, there is no The Thumb code implementation of the gcd algorithm without conditional instructions requires seven instructions. The overall code size is 14 bytes. This is even less than the ARM implementation that uses conditional instructions, which uses 16 bytes. In addition, on a system using 16-bit memory this Thumb implementation runs faster than both ARM implementations because only one memory access is required for each 16-bit Thumb instruction, whereas each 32-bit ARM instruction requires two fetches.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||