Character sets and identifiers
Character sets and identifiers
The following points apply to the character sets and identifiers
expected by the compiler:
and lowercase characters are distinct in all internal and external
identifiers. An identifier can also contain a dollar
--strict compiler option is specified.
To permit dollar signs in identifiers with the
also use the
--dollar command-line option.
setlocale(LC_CTYPE, "ISO8859-1") makes
islower() functions behave
as expected over the full 8-bit Latin-1 alphabet, rather than over
the 7-bit ASCII subset. The locale must be selected at link time.
Source files are compiled according to the currently
selected locale. You might have to select a different locale, with
--locale command-line option, if the source
file contains non-ASCII characters. See Compiler
command-line options listed by group in Using
the Compiler for more information.
The compiler supports multibyte character sets,
such as Unicode.
Other properties of the source character set are
The properties of the execution character set are target-specific.
The ARM C and C++ libraries support the ISO 8859-1 (Latin-1 Alphabet)
character set with the following consequences:
The execution character set
is identical to the source character set.
There are eight bits in a character in the execution
There are four characters (bytes) in an int.
If the memory system is:
The bytes are ordered
from least significant at the lowest address to most significant
at the highest address.
The bytes are
ordered from least significant at the highest address to most significant
at the lowest address.
In C all character constants have type int.
In C++ a character constant containing one character has the type char and
a character constant containing more than one character has the
type int. Up to four characters of the constant are
represented in the integer value. The last character in the constant
occupies the lowest-order byte of the integer value. Up to three
preceding characters are placed at higher-order bytes. Unused bytes
are filled with the
Table 31 lists
all integer character constants, that contain a single character
or character escape sequence, are represented in both the source
and execution character sets.
Table 31. Character escape codes
|Escape sequence||Char value||Description|
|New line (line feed)|
|ASCII code in hexadecimal|
|ASCII code in octal|
Characters of the source character set in string
literals and character constants map identically into the execution
Data items of type char are unsigned
by default. They can be explicitly declared as signed char or unsigned
Care must be taken when mixing translation units that have
been compiled with and without the
and that share interfaces or data structures.
The ARM ABI defines char as an unsigned byte,
and this is the interpretation used by the C++ libraries supplied
with the ARM compilation tools.
No locale is used to convert multibyte characters
into the corresponding wide characters for a wide character constant.
This is not relevant to the generic implementation.