Keil Logo

Technical Support

On-Line Manuals

Compiler User Guide

Preface Overview of the Compiler Getting Started with the Compiler Compiler Features Compiler Coding Practices The compiler as an optimizing compiler Compiler optimization for code size versus speed Compiler optimization levels and the debug view Selecting the target processor at compile time Enabling FPU for bare-metal Optimization of loop termination in C code Loop unrolling in C code Compiler optimization and the volatile keyword Code metrics Code metrics for measurement of code size and data Stack use in C and C++ Benefits of reducing debug information in objects Methods of reducing debug information in objects a Guarding against multiple inclusion of header file Methods of minimizing function parameter passing o Returning structures from functions through regist Functions that return the same result when called Comparison of pure and impure functions Recommendation of postfix syntax when qualifying f Inline functions Compiler decisions on function inlining Automatic function inlining and static functions Inline functions and removal of unused out-of-line Automatic function inlining and multifile compilat Restriction on overriding compiler decisions about Compiler modes and inline functions Inline functions in C++ and C90 mode Inline functions in C99 mode Inline functions and debugging Types of data alignment Advantages of natural data alignment Compiler storage of data objects by natural byte a Relevance of natural data alignment at compile tim Unaligned data access in C and C++ code The __packed qualifier and unaligned data access i Unaligned fields in structures Performance penalty associated with marking whole Unaligned pointers in C and C++ code Unaligned Load Register (LDR) instructions generat Comparisons of an unpacked struct, a __packed stru Compiler support for floating-point arithmetic Default selection of hardware or software floating Example of hardware and software support differenc Vector Floating-Point (VFP) architectures Limitations on hardware handling of floating-point Implementation of Vector Floating-Point (VFP) supp Compiler and library support for half-precision fl Half-precision floating-point number format Compiler support for floating-point computations a Types of floating-point linkage Compiler options for floating-point linkage and co Floating-point linkage and computational requireme Processors and their implicit Floating-Point Units Integer division-by-zero errors in C code Software floating-point division-by-zero errors in About trapping software floating-point division-by Identification of software floating-point division Software floating-point division-by-zero debugging New language features of C99 New library features of C99 // comments in C99 and C90 Compound literals in C99 Designated initializers in C99 Hexadecimal floating-point numbers in C99 Flexible array members in C99 __func__ predefined identifier in C99 inline functions in C99 long long data type in C99 and C90 Macros with a variable number of arguments in C99 Mixed declarations and statements in C99 New block scopes for selection and iteration state _Pragma preprocessing operator in C99 Restricted pointers in C99 Additional library functions in C99 Complex numbers in C99 Boolean type and in C99 Extended integer types and functions in floating-point environment access in C99 snprintf family of functions in C99 type-generic math macros in C99 wide character I/O functions in C99 How to prevent uninitialized data from being initi Compiler Diagnostic Messages Using the Inline and Embedded Assemblers of the AR Compiler Command-line Options Language Extensions Compiler-specific Features C and C++ Implementation Details What is Semihosting? Via File Syntax Summary Table of GNU Language Extensions Standard C Implementation Definition Standard C++ Implementation Definition C and C++ Compiler Implementation Limits

Stack use in C and C++

4.11 Stack use in C and C++

C and C++ both use the stack intensively.

For example, the stack holds:
  • The return address of functions.
  • Registers that must be preserved, as determined by the ARM Architecture Procedure Call Standard (AAPCS), for instance, when register contents are saved on entry into subroutines.
  • Local variables, including local arrays, structures, unions, and in C++, classes.
Some stack usage is not obvious, such as:
  • Local integer or floating point variables are allocated stack memory if they are spilled (that is, not allocated to a register).
  • Structures are normally allocated to the stack. A space equivalent to sizeof(struct) padded to a multiple of four bytes is reserved on the stack. The compiler tries to allocate structures to registers instead.
  • If the size of an array size is known at compile time, the compiler allocates memory on the stack. Again, a space equivalent to sizeof(struct) padded to a multiple of four bytes is reserved on the stack.


    Memory for variable length arrays is allocated at runtime, on the heap.
  • Several optimizations can introduce new temporary variables to hold intermediate results. The optimizations include: CSE elimination, live range splitting and structure splitting. The compiler tries to allocate these temporary variables to registers. If not, it spills them to the stack.
  • Generally, code compiled for processors that support only 16-bit encoded Thumb instructions makes more use of the stack than ARM code and code compiled for processors that support 32-bit encoded Thumb instructions. This is because 16-bit encoded Thumb instructions have only eight registers available for allocation, compared to fourteen for ARM code and 32-bit encoded Thumb instructions.
  • The AAPCS requires that some function arguments are passed through the stack instead of the registers, depending on their type, size, and order.

Methods of estimating stack usage

Stack use is difficult to estimate because it is code dependent, and can vary between runs depending on the code path that the program takes on execution. However, it is possible to manually estimate the extent of stack utilization using the following methods:
  • Link with --callgraph to produce a static callgraph. This shows information on all functions, including stack use.
  • Link with --info=stack or --info=summarystack to list the stack usage of all global symbols.
  • Use the debugger to set a watchpoint on the last available location in the stack and see if the watchpoint is ever hit.


    μVision uses the ULINK platform with the CoreSight™ interface. These adaptors have no performance penalties.
  • Use the debugger, and:
    1. Allocate space in memory for the stack that is much larger than you expect to require.
    2. Fill the stack space with copies of a known value, for example, 0xDEADDEAD.
    3. Run your application, or a fixed portion of it. Aim to use as much of the stack space as possible in the test run. For example, try to execute the most deeply nested function calls and the worst case path found by the static analysis. Try to generate interrupts where appropriate, so that they are included in the stack trace.
    4. After your application has finished executing, examine the stack space of memory to see how many of the known values have been overwritten. The space has garbage in the used part and the known values in the remainder.
    5. Count the number of garbage values and multiply by four, to give their size, in bytes.
    The result of the calculation shows how the size of the stack has grown, in bytes.
  • Use RTSM, and define a region of memory where access is not allowed directly below your stack in memory, with a map file. If the stack overflows into the forbidden region, a data abort occurs, which can be trapped by the debugger.

Methods of reducing stack usage

In general, you can lower the stack requirements of your program by:
  • Writing small functions that only require a small number of variables.
  • Avoiding the use of large local structures or arrays.
  • Avoiding recursion, for example, by using an alternative algorithm.
  • Minimizing the number of variables that are in use at any given time at each point in a function.
  • Using C block scope and declaring variables only where they are required, so overlapping the memory used by distinct scopes.
The use of C block scope involves declaring variables only where they are required. This minimizes use of the stack by overlapping memory required by distinct scopes.


Code performance is optimized by locating the stack in fast (zero wait-state), on-chip, 32-bit RAM. The ARM (LDMFD and STMFD) and Thumb (PUSH and POP) stack access instructions both push and pop a number of 32-bit registers on or off the stack. If the stack is in 32-bit memory, each register access takes one cycle. However, if the stack is in 16-bit memory then each register access takes two cycles, reducing overall performance.
Non-ConfidentialPDF file icon PDF versionARM DUI0375H
Copyright © 2007, 2008, 2011, 2012, 2014-2016 ARM. All rights reserved. 
  Arm logo
Important information

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.

Change Settings

Privacy Policy Update

Arm’s Privacy Policy has been updated. By continuing to use our site, you consent to Arm’s Privacy Policy. Please review our Privacy Policy to learn more about our collection, use and transfers
of your data.