BTB operation (TIGER-Cortex-A8).
I'm writing some tests on TIGER platform and I have some miss
understanding on BTB operation.
The test is doing 40 loops, were after 10, 20, 30 loops I add some
extra code and check how many cycles
the core need between the loop last command (its conditional always
taken branch) and the first command in the loop,
which is actually if the last branch is predicted taken.
1. After 10 loops I add some extra nop's
2. After 20 loops I add BTB invalidation command
3. After 30 loops I write all 512 entries of the BTB with some new
The problem is that in all cases the cycle count is behave
First 10 loops: loops 1-2 there is about 13 cycles penalty. loops
3-10 there is 1 or 2 cycle.
Second 10 loops (after extra nop's): loops 11-12 there is about 13
cycles penalty. loops 13-20 there is 1 or 2 cycle.
Third 10 loops (after extra nop's): loops 21-22 there is about 13
cycles penalty. loops 23-30 there is 1 or 2 cycle.
Fourth 10 loops (after extra nop's): loops 31-32 there is about 13
cycles penalty. loops 33-40 there is 1 or 2 cycle.
I expect in the first case of adding nop's that all the following
10 loops will also be predicted taken, and not
to pay in the first two loops the 13 cycles penalty.
My question is:
- Is the above description is the core behavior, if yes why ? , and
can I do something to gain the 13 cycle penalty if
the branch instruction is in the BTB but some extra code with other
branches were run.