Keil Logo

Stack use in C and C++

4.4 Stack use in C and C++

C and C++ both use the stack intensively.

For example, the stack holds:

  • The return address of functions.

  • Registers that must be preserved, as determined by the Arm® Architecture Procedure Call Standard (AAPCS) or the Arm® Architecture Procedure Call Standard for the Arm® 64-bit Architecture (AAPCS64). For example, when register contents are saved on entry into subroutines.

  • Local variables, including local arrays, structures, and unions.

  • Classes in C++.

Some stack usage is not obvious, such as:

  • If local integer or floating-point variables are spilled (that is, not allocated to a register), they are allocated stack memory.

  • Structures are normally allocated to the stack. A space equivalent to sizeof(struct) padded to a multiple of n bytes is reserved on the stack, where n is 16 for AArch64 state, or 8 for AArch32 state. However, the compiler might try to allocate structures to registers instead.

  • If the size of an array is known at compile time, the compiler allocates memory on the stack. Again, a space equivalent to sizeof(array) padded to a multiple of n bytes is reserved on the stack, where n is 16 for AArch64 state, or 8 for AArch32 state.


    Memory for variable length arrays is allocated at runtime, on the heap.
  • Several optimizations can introduce new temporary variables to hold intermediate results. The optimizations include: CSE elimination, live range splitting, and structure splitting. The compiler tries to allocate these temporary variables to registers. If not, it spills them to the stack. For more information about what these optimizations do, see Overview of optimizations.

  • Generally, code that is compiled for processors that only support 16-bit encoded T32 instructions makes more use of the stack than A64 code, A32 code, and code that is compiled for processors that support 32-bit encoded T32 instructions. This is because 16-bit encoded T32 instructions have only eight registers available for allocation, compared to fourteen for A32 code and 32-bit encoded T32 instructions.

  • The AAPCS64 requires that some function arguments are passed through the stack instead of the registers, depending on their type, size, and order.

Processors for embedded applications have limited memory and therefore the amount of space available on the stack is also limited. You can use Arm Compiler to determine how much stack space is used by the functions in your application code. The amount of stack that a function uses depends on factors such as the number and type of arguments to the function, local variables in the function, and the optimizations that the compiler performs.

Methods of estimating stack usage

Stack use is difficult to estimate because it is code dependent, and can vary between runs depending on the code path that the program takes on execution. However, it is possible to manually estimate the extent of stack utilization using the following methods:

  • Compile with -g and link with --callgraph to produce a static callgraph. This callgraph shows information on all functions, including stack usage.

  • Link with --info=stack or --info=summarystack to list the stack usage of all global symbols.

  • Use a debugger to set a watchpoint on the last available location in the stack and see if the watchpoint is ever hit. Compile with the -g option to generate the necessary DWARF information.

  • Use a debugger, and:

    1. Allocate space in memory for the stack that is much larger than you expect to require.

    2. Fill the stack space with copies of a known value, for example, 0xDEADDEAD.

    3. Run your application, or a fixed portion of it. Aim to use as much of the stack space as possible in the test run. For example, try to execute the most deeply nested function calls and the worst case path that the static analysis finds. Try to generate interrupts where appropriate, so that they are included in the stack trace.

    4. After your application has finished executing, examine the stack space of memory to see how many of the known values have been overwritten. The space has garbage in the used part and the known values in the remainder.

    5. Count the number of garbage values and multiply by sizeof(value), to give their size, in bytes.

    The result of the calculation shows how the size of the stack has grown, in bytes.

  • Use a Fixed Virtual Platform (FVP) that corresponds to the target processor or architecture. With a map file, define a region of memory directly below your stack where access is forbidden. If the stack overflows into the forbidden region, a data abort occurs, which a debugger can trap.

Examining stack usage

It is good practice to examine the amount of stack that the functions in your application use. You can then consider rewriting your code to reduce stack usage.

To examine the stack usage in your application, use the linker option --info=stack. The following example code shows functions with different numbers of arguments:

__attribute__((noinline)) int fact(int n)
  int f = 1;
  while (n>0)
      f *= n--;
  return f;

int foo (int n)
  return fact(n);

int foo_mor (int a, int b, int c, int d)
 return fact(a);

int main (void)
  return foo(10) + foo_mor(10,11,12,13);

Copy the code example to file.c and compile it using the following command:

armclang --target=arm-arm-none-eabi -march=armv8-a -c -g file.c -o file.o

Compiling with the -g option generates the DWARF frame information that armlink requires for estimating the stack use. Run armlink on the object file using --info=stack:

armlink file.o --info=stack

For the example code, armlink shows the amount of stack that the various functions use. Function foo_mor has more arguments than function foo, and therefore uses more stack.

Stack Usage for fact 0xc bytes.
Stack Usage for foo 0x8 bytes.
Stack Usage for foo_mor 0x10 bytes.
Stack Usage for main 0x8 bytes.

You can also examine stack usage using the linker option --callgraph:

armlink file.o --callgraph -o FileImage.axf

This outputs a file called FileImage.htm which contains the stack usage information for the various functions in the application.

fact (ARM, 84 bytes, Stack size 12 bytes, file.o(.text)) 


Max Depth = 12
Call Chain = fact

[Called By]
>>   foo_mor
>>   foo
foo (ARM, 36 bytes, Stack size 8 bytes, file.o(.text)) 


Max Depth = 20
Call Chain = foo >> fact

>>   fact

[Called By]
>>   main
foo_mor (ARM, 76 bytes, Stack size 16 bytes, file.o(.text)) 


Max Depth = 28
Call Chain = foo_mor >> fact

>>   fact

[Called By]
>>   main
main (ARM, 76 bytes, Stack size 8 bytes, file.o(.text)) 


Max Depth = 36
Call Chain = main >> foo_mor >> fact

>>   foo_mor
>>   foo

[Called By]
>>   __rt_entry_main (via BLX)

See --info and --callgraph for more information on these options.

Methods of reducing stack usage

In general, you can lower the stack requirements of your program by:

  • Writing small functions that only require a few variables.

  • Avoiding the use of large local structures or arrays.

  • Avoiding recursion.

  • Minimizing the number of variables that are in use at any given time at each point in a function.

  • Using C block scope syntax and declaring variables only where they are required, so that distinct scopes can use the same memory.

Non-ConfidentialPDF file icon PDF version100748_0616_01_en
Copyright © 2016–2021 Arm Limited or its affiliates. All rights reserved. 
  Arm logo
Important information

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.

Change Settings

Privacy Policy Update

Arm’s Privacy Policy has been updated. By continuing to use our site, you consent to Arm’s Privacy Policy. Please review our Privacy Policy to learn more about our collection, use and transfers
of your data.