Wide Character Strings
Last week we looked at
Unicode
escape sequences in C string and NSString
literal.
Today we'll take a quick overview of wide character strings
and talk about where they fit into the iOS development.
When the C language was developed in the early 1970's, little thought
was given to representing non-English languages. By default, most C
compilers assumed that both code files and application output used
7-bit ASCII encoding and that each logical character in a string fit
into a single 8-bit byte or char
value. By the time C was
first standardized by ANSI in 1989 (and by ISO in 1990), the need to
handle many more characters than ASCII was obvious, but the Unicode
standard was still nascent. So the ANSI C committee included a wide
character type and wide character string functions in the C89 standard,
but didn't tie wide character support to any specific character
encoding scheme.
wchar_t
C89 introduced a new integer type, wchar_t
. This is
similar to a char
, but typically "wider". On many
systems, including Windows, a wchar_t
is 16 bits. This is
typical of systems that implemented their Unicode support using earlier
versions of the Unicode standard, which originally defined fewer than
65,535 characters. Unicode was later expanded to support historical
and special purpose character sets, so on some systems, including Mac
OS X and iOS, the wchar_t
type is 32 bits in size. This
is often poorly documented, but you can use a simple test like this to
find out:
// how big is wchar_t?
NSLog(@"wchar_t is %u bits wide", 8 * sizeof(wchar_t));
On a Mac or iPhone, this will print "wchar_t is 32 bits wide".
Additionally, wchar_t
is a typedef
for
another integer type in C. In C++, wchar_t
is a built-in
integer type. In practice, this means you need to #include
<wchar.h>
in C when using wide characters.
signed or unsigned?
The char
integer type is almost always a signed integer
with a range from -128 to 127. You can use the CHAR_MIN
and CHAR_MAX
constants defined in
<limits.h>
to find out the range for a particular
compiler:
NSLog(@"CHAR_MIN = %0.f", (double)CHAR_MIN);
NSLog(@"CHAR_MAX = %0.f", (double)CHAR_MIN);
The wchar_t
type can be signed or unsigned. The
WCHAR_MIN
and WCHAR_MAX
constants hold the
range of a wchar_t
and are defined in both
<wchar.h>
and <stdint.h>
.
NSLog(@"WCHAR_MIN = %0.f", (double)WCHAR_MIN);
NSLog(@"WCHAR_MAX = %0.f", (double)WCHAR_MIN);
On Windows, wchar_t
is an unsigned 16-bit integer. On Mac
and iPhone, wchar_t
is a signed 32-bit integer,
so the code above will print out "WCHAR_MAX = 2147483647" and
"WCHAR_MIN = -2147483648". For the most part you don't need to worry
about whether wchar_t
is signed or unsigned; it only
becomes important if you need to do comparisons and operations that mix
wchar_t
with other integer types (a rarity).
wide character literals
We looked at C
string literals in previous entries. Wide character string
literals are very similar, but are prefixed with 'L':
// example of a wide character string literal
wchar_t const *s = L"foobarf!";
Like C string literals, wide strings separated by only whitespace are
considered one logical string:
// wide strings written in segments
wchar_t const *s1 = L"foo" "bar";
wchar_t const *s2 = L"Hello, " L"world!";
wide character functions
Most string functions in the standard C library are defined in the
<string.h>
header. A very similar set of functions for
wide character strings are defined in <wchar.h>
. The
functions follow a similar naming convention. Where string functions
are prefixed with str
, the wide character equivalents are
prefixed with wcs
(for wide
character string). So the
strlen()
function calculates the length of a string and
the corresponding wcslen()
function calculates the length
of a wide character string.
not used much
In practice, you won't use wide character strings very often in
Objective-C since the NSString
class does just about
everything wide character strings are meant to do, but you may
occasionally run across them in other C libraries.
Next time, we'll begin looking at common string operations using C
strings and NSString
s, starting with
string
concatenation.