armlink has global visibility of all your program code and so can perform some additional branch optimizations.
armlink uses branch inlining to optimize small function calls in your image. A small function is defined as any one-instruction function that can be inlined into the 4 bytes of a BL or BLX instruction. In this case, there is no branch and, therefore, the return address is redundant.
Note
This branch optimization is off by default because enabling it changes the image such that debug information might be incorrect. If enabled, the linker makes no attempt to correct the debug information.
Use the command-line options to control branch inlining:
‑‑no_branchnopThe linker replaces any branch with a relocation that resolves to the next instruction with a NOP. This is the default behavior. However, there are cases where you might want to disable the option, for example, when performing verification or pipeline flushes.
Use the ‑‑no_branchnop option to disable this behavior.
‑‑inlineEnables branch inlining. See Controlling inlining for more information.
‑‑tailreorderMoves tail calling sections immediately before their target, if possible, to optimize function calls. See Handling tail calling sections for more information.
If you enable branch inlining, armlink scans each function call in the image and then inlines where applicable. When armlink inlines a function, it removes the reference to the called function from the caller. armlink applies this optimization before any unused sections are eliminated so that any section that is always inlined can then be removed.
Use the ‑‑info command-line option to display information about branch inlining:
‑‑info inlineDisplays a message each time a function is inlined and gives the total number of inlines, for example:
Small function inlining results
Inlined function __Heap_DescSize from object h1_alloc.o at offset 0x5c in section .text from object malloc.o.
Inlined function __ieee_status from object istatus.o at offset 0x40 in section .text from object _printf_fp_dec.o.
.
Inlined total of 6 calls.
If you have enabled branch inlining, there are certain conditions that a function must meet in order to be inlined:
armlink handles only the simplest cases and does not inline any instruction that reads or writes to the PC because this depends on the location of the function.
If your image contains both ARM and Thumb code, functions that are called from the other state must be built for interworking. An ARM caller might inline a Thumb callee if an equivalent ARM instruction is available. However, a Thumb caller cannot inline an ARM callee. Also, armlink can inline up to two 16-bit Thumb instructions. However, an ARM caller can only inline a single 16-bit Thumb instruction.
The action of the linker also depends on the size of the symbol representing a function and on the caller (ARM or Thumb) and the callee (ARM or Thumb) as shown in Table 3.2.
Table 3.2. Inlining small functions
| Caller | Callee | Symbol size that can be inlined |
|---|
| ARM | ARM | 4 to 8 bytes |
| ARM | Thumb | 2 to 6 bytes |
| Thumb | Thumb | 2 to 6 bytes |
| Thumb | ARM | 4 to 8 bytes |
In order to be inlined, the last instruction of a function must be either:
MOV pc, lr
or
BX lr
A function that consists of just a return sequence can be inlined as a NOP.
A conditional ARM instruction can only be inlined if either the condition on the BL matches the condition on the instruction being inlined, or the BL or instruction to be inlined is unconditional. For example, BLEQ can only inline an unconditional instruction like ADD or an instruction with a matching condition like ADDEQ.
An unconditional ARM BL can inline any conditional or unconditional instruction that satisfies all the other criteria.
A BL that is the last instruction of an IT block cannot inline a 16-bit Thumb instruction or a 32-bit MRS, MSR, or CPS instruction. This is because the IT block changes the behavior of the instructions within its scope so inlining the instruction changes the behavior of the program.
Handling tail calling sections
As described in Controlling inlining, the linker replaces any branch with a relocation that resolves to the next instruction with a NOP. This means that tail calling sections, that is, sections that finish with a branch instruction, might be optimized so that their target appears immediately after them in the execution region.
You can take advantage of this behavior by using the command-line option ‑‑tailreorder to move tail calling sections above their target. If this is possible, be aware that:
armlink can only move one tail calling section for each tail call target. If there are multiple tail calls to a single section, the tail calling section with an identical section name is moved before the target. If no section name is found in the tail calling section that has a matching name, then the linker moves the first section it encounters.
armlink cannot move a tail calling section out of its execution region.
armlink does not move tail calling sections before inline veneers.
Use the ‑‑info command-line option to display information about tail call optimization. For example, ‑‑info tailreorder gives information on any moved tail calling sections:
Tailcall reorder results
Tail calling Section !!!main from object __main.o placed before .text from kernel.o
Tail calling Section .text from object rt_raise.o placed before .text from sys_exit.o
Tail calling Section .text from object plibspace.o placed before .text from libspace.o
Tail calling Section .text from object aeabi_idiv0.o placed before .text from rt_div0.o
......