Keil Logo

Character sets and identifiers in ARM C and C++

10.1 Character sets and identifiers in ARM C and C++

Describes the character set and identifier implementation details in ARM C and C++.

The following point applies to the identifiers expected by the compiler:
  • Uppercase and lowercase characters are distinct in all internal and external identifiers. An identifier can also contain a dollar ($) character unless the --strict compiler option is specified. To permit dollar signs in identifiers with the --strict option, also use the --dollar command-line option.
The following points apply to the character sets expected by the compiler:
  • Calling setlocale(LC_CTYPE, "ISO8859-1") makes the isupper() and islower() functions behave as expected over the full 8-bit Latin-1 alphabet, rather than over the 7-bit ASCII subset. The locale must be selected at link time.
  • Source files are compiled according to the currently selected locale. You might have to change the locale using the --locale command-line option if the source file contains non-ASCII characters. If you do not specify --locale, the system locale is used.
  • The compiler supports multibyte character sets, such as Unicode. You can control this support using the --[no_]multibyte_chars options.
  • If the source file encoding is UTF-8 or UTF-16, and the file starts with a byte order mark then the compiler ignores the --[no_]multibyte_chars and --locale options and interprets the file as UTF-8 or UTF-16.
  • Other properties of the source character set are host-specific.
The properties of the execution character set are target-specific. The ARM® C and C++ libraries support the ISO 8859-1 (Latin-1 Alphabet) character set with the following consequences:
  • The execution character set is identical to the source character set.
  • There are eight bits in a character in the execution character set.
  • There are four characters (bytes) in an int. If the memory system is:
    The bytes are ordered from least significant at the lowest address to most significant at the highest address.
    The bytes are ordered from least significant at the highest address to most significant at the lowest address.
  • In C all character constants have type int. In C++ a character constant containing one character has the type char and a character constant containing more than one character has the type int. Up to four characters of the constant are represented in the integer value. The last character in the constant occupies the lowest-order byte of the integer value. Up to three preceding characters are placed at higher-order bytes. Unused bytes are filled with the NUL (\0) character.
  • All integer character constants that contain a single character, or character escape sequence, are represented in both the source and execution character sets.The following table lists the supported character escape codes.

    Table 10-1 Character escape codes

    Escape sequence Char value Description
    \a 7 Attention (bell)
    \b 8 Backspace
    \t 9 Horizontal tab
    \n 10 New line (line feed)
    \v 11 Vertical tab
    \f 12 Form feed
    \r 13 Carriage return
    \xnn 0xnn ASCII code in hexadecimal
    \nnn 0nnn ASCII code in octal
  • Characters of the source character set in string literals and character constants map identically into the execution character set.
  • Data items of type char are unsigned by default. They can be explicitly declared as signed char or unsigned char:
    • the --signed_chars option makes the char signed
    • the --unsigned_chars option makes the char unsigned.


    Care must be taken when mixing translation units that have been compiled with and without the --signed_chars and --unsigned_chars options, and that share interfaces or data structures.
    The ARM ABI defines char as an unsigned byte, and this is the interpretation used by the C++ libraries supplied with the ARM compilation tools.
  • Converting multibyte characters into the corresponding wide characters for a wide character constant does not use a locale. This is not relevant to the generic implementation.
Non-ConfidentialPDF file icon PDF versionARM DUI0375H
Copyright © 2007, 2008, 2011, 2012, 2014-2016 ARM. All rights reserved. 
  Arm logo
Important information

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.

Change Settings

Privacy Policy Update

Arm’s Privacy Policy has been updated. By continuing to use our site, you consent to Arm’s Privacy Policy. Please review our Privacy Policy to learn more about our collection, use and transfers
of your data.