String Comparison And Equality
Welcome back after an end-of-Summer hiatus. Last time we looked at
concatenating
strings in Objective-C. Today we look at another common string
operation: comparison.
Identity
Comparing two variables or objects can sometimes be a tricky
proposition. There are several different senses of equality. The most
fundamental type of equality is identity: do two variables
represent the same thing in memory. Identity only makes sense for
reference types, like C strings, NSString
s and other
pointer types. Value types like int
s always designate
separate things in memory. In C and Objective-C, identity equality is
determined by comparing pointer values using the ==
operator.
// comparing two strings for identity
char const s1 = "foo";
char const s2 = s1;
if (s1 == s2) {
NSLog(@"s1 is identical to s2");
}
NSString *s3 = @"foo";
NSString *s4 = @"bar";
if (s3 != s4) {
NSLog(@"s3 is not identical to s4");
}
Equivalence
A more useful type of equality is equivalence of value: do two
variables represent equivalent data. Equivalence is useful when
comparing value types as well as reference types, and is usually what
programmers think of when comparing two strings.
For C strings, the primary equivalence test is done with the
strcmp()
function. The strcmp()
function
compares the data of two C strings char
by
char
; if two C strings represent the same sequence of
char
values in memory, they are equivalent and
strcmp()
returns zero.
// checking two C strings for equal value
char const *s1 = "foo";
char const *s2 = "bar";
if (strcmp(s1, s2) == 0) {
NSLog(@"s1 is equivalent to s2");
} else {
NSLog(@"s1 is not equivalent to s2");
}
In addition to checking for equivalence, strcmp()
also
categorizes the sort order of the two C strings. If the first argument
comes before the second, a negative value is returned; if the
first argument comes after the second, a positive value is
returned. The strcmp()
function uses a lexicographic
comparison, which means that the comparison is strictly on the basis of
the integer values of the char
s in the C strings. For
ASCII strings, the string "2"
(ASCII code 50) comes before
"A"
(ASCII code 64), which precedes "a"
(ASCII code 97). Many sorting algorithms, including the
qsort()
function in the C standard library, require a
function like strcmp()
.
// using strcmp() result
int compareResult = strcmp(s1, s2);
if (compareResult < 0) {
NSLog(@"s1 comes before s2");
} else if (compareResult > 0) {
NSLog(@"s1 comes after s2");
}
Sometimes you only want to see if two strings have a common prefix, or
you're working with character buffers that aren't null terminated. The
strncmp()
function will compare a limited
number of characters, stopping early if it encounters a null terminator
in either string. Thus these two strings are equivalent when the first
three characters are compared:
if (strncmp("foo", "fooey", 3) == 0) {
NSLog(@"both start with foo");
}
// prints "both start with foo"
When sorting with strncmp()
, short strings come first:
if (strncmp("foo", "fooey", 5) < 0) {
NSLog(@"foo comes before fooey");
}
// prints "foo comes before fooey"
Case Insensitive
In languages that have upper and lower case letters, you often need to
do a case insensitive comparisons. The C standard library doesn't
define a case insensitive string comparison function, but one is part
of the POSIX standard,
and most compiler vendors and operating systems include one. The POSIX
version is called strcasecmp()
. Most modern Unix and
Linux systems (including iOS and Mac OS X) have
strcasecmp()
available in the standard library. Older
Unix systems and other operating systems may call this function
stricmp()
or strcmpi()
. There is usually
also a length limited version called
strncasecmp()
or
strnicmp()
.
The case insensitive comparison functions usually compare only ASCII
characters, which limits their usefulness.
// case insensitive comparison
char const *s = "<HTML><HEAD>...";
if (strncasecmp(s, "<html>", 6) == 0) {
NSLog(@"looks like HTML");
}
Encoding Issues
The strcmp()
function was created in the era when most
computers used ASCII or other simple single byte encodings. In ASCII,
there is only one byte sequence that represents any particular
character sequence. This isn't true of many modern encodings,
including Unicode. The Unicode character set contains both accented
characters such as "é" as well as a combining accent character
"´", so there are two ways to represent "é" in UTF-8 encoding:
Address | 64 | 65 | 66 |
Character | 'é' | |
Value | 195 | 169 | |
Character | 'e' | '´' |
Value | 101 | 204 | 129 |
Obviously a lexicographic comparison function like strcmp()
will not see these two strings as equivalent. Accounting for this
requires performing
normalization
on the Unicode characters in the string before doing the
comparison. Unicode has several different types of normalization,
which we won't dive into here. If you need to do a lot of low level
processing of UTF-8 or other Unicode encoded text, you should look at
the International Components for
Unicode, a library of C functions for Unicode processing that is
included as part of iOS. Better yet, in most cases you should use
NSString
s when working with text.
NSString
equality
The NSString
class defines the
-isEqualToString:
instance method for testing if an
NSString
is equivalent to another NSString
:
// compare two NSStrings
NSString *s1 = @"foo";
NSString *s2 = @"bar";
if ( [s1 isEqualToString:s2] ) {
NSLog(@"The strings are equivalent.");
}
You can also use the -isEqual:
instance method defined by
NSObject
to compare two NSString
s, or to
compare an NSString
with any other object:
// compare two NSStrings using -isEqual:
NSString *s1 = @"foo";
NSString *s2 = @"bar";
if ( [s1 isEqual:s2] ) {
NSLog(@"The strings are equivalent.");
}
The difference between the two methods is in their declarations. The
-isEqualToString:
method is only for comparing one
NSString
to another; it's declaration looks like:
// declaration of -isEqualToString:
- (BOOL)isEqualToString:(NSString *)aString
The -isEqual:
method is for comparing any kind of
NSObject
to another object; it's declaration looks like:
// declaration of -isEqual:
- (BOOL)isEqual:(id)anObject
It's possible to use -isEqual:
to compare an
NSString
with an object of a different type, such as an
NSNumber
:
NSString *fiveString = @"5";
NSNumber *fiveNumber = [NSNumber numberWithInt:5];
if ( [fiveString isEqual:fiveNumber] ) {
NSLog(@"fiveString equals fiveNumber");
} else {
NSLog(@"Strings aren't equivalent to numbers, silly!");
}
You might hope that the NSString
"5" is equivalent to the
NSNumber
"5" but unfortunately they are not; the code
above will print out "Strings aren't equivalent to numbers, silly!".
In general, objects of different classes aren't considered to be
equivalent with one common exception: immutable classes like
NSString
can be equivalent to their mutable subclasses
(NSMutableString
in this case) and vice versa.
NSString *fiveString = @"5";
NSMutableString *fiveMutableString = [NSMutableString stringWithString:@"5"];
if ( [fiveString isEqual:fiveMutableString] ) {
NSLog(@"immutable and mutable strings can be equivalent");
}
And since NSMutableString
is a subclass of
NSString
, you can also use -isEqualToString:
to compare them:
if ( [fiveString isEqualToString:fiveMutableString] ) {
NSLog(@"immutable and mutable strings can be equivalent");
}
-compare:
In addition to testing for equivalence using -isEqual:
or
-isEqualToString:
, you can also discover the relative
order of two NSString
objects using the
-compare:
family of methods. The -compare:
method is very similar to the strcmp()
method in C. The
-compare:
method returns a NSComparisonResult
value, which is simply an integer value. Similar to
strcmp()
, -compare:
will return zero if the
two NSString
s are equivalent, though you can also use the
constant NSOrderedSame
instead of zero:
// compare two NSStrings
NSString *s1 = @"foo";
NSString *s2 = @"bar";
if ( [s1 compare:s2] == NSOrderedSame] ) {
NSLog(@"s1 is equivalent to s2");
} else {
NSLog(@"s1 is not equivalent to s2");
}
Like strcmp()
, if the receiver of the
-compare:
message (the first NSString
) comes
before the first argument (the second NSString
),
negative one is returned; if the receiver comes after the
first argument, positive one is returned. The constants
NSOrderedAscending
and NSOrderedDescending
can be used instead of -1 and 1 respectively.
// using NSComparisonResult
NSComparisonResult comparisonResult = [s1 compare:s2];
if (comparisonResult == NSOrderedAscending) {
NSLog(@"s1 comes before s2");
} else if (comparisonResult == NSOrderedAscending) {
NSLog(@"s1 comes after s2");
}
Case Insensitive -compare:
To test the equivalence of two NSString
objects in a case
insensitive manner, use -compare:options:
with the
NSCaseInsensitiveSearch
flag.
// case insensitive compare
NSString *s1 = @"foo";
NSString *s2 = @"FOO";
if ( [s1 compare:s2 options:NSCaseInsensitiveSearch] == NSOrderedSame) {
NSLog(@"s1 is equivalent to s2");
}
Since case insensitive comparison is a common operation,
NSString
has a convenience method,
-caseInsensitiveCompare:
which does the same thing.
// case insensitive compare
NSString *s1 = @"foo";
NSString *s2 = @"FOO";
if ( [s1 caseInsensitiveCompare:s2] == NSOrderedSame) {
NSLog(@"s1 is equivalent to s2");
}
Unicode and -compare:
By default, NSString
is pretty smart about Unicode and
automatically understands things like Unicode combining characters.
For instance, you can represent é two ways, but NSString
knows that they represent equivalent strings:
// comparing equivalent Unicode strings
NSString *eAcute = @"\u00e9"; // single character 'é'
NSString *ePlusAcute = @"e\u0301"; // 'e' + combining '´'
if ( [eAcute isEqualToString:ePlusAcute] ) {
NSLog(@"'é' is equivalent to 'e' + '´'");
}
This can be surprising if you've only worked with ASCII or other single
byte encodings. With NSString
, you can't assume that
equivalent strings have the same length and character sequence.
Usually you don't care about the Unicode representation, but
occasionally it's important. You can use the
NSLiteralSearch
flag along with
-compare:options:
to do a lexicographic comparison that
compares strings character value by character value.
// lexicographic comparison of Unicode strings
if ( [eAcute compare:ePlusAcute options:NSLiteralSearch] != NSOrderedSame) {
NSLog(@"'é' is not lexicographically equivalent to 'e' + '´'");
}
combining options
The options constants used in the -compare:options:
method
are bit flags. You combine them using the bitwise or operator
(|
).
// using multiple options
NSString *eAcute = @"\u00e9"; // 'é'
NSString *capitalEAcute = @"\u00c9"; // 'É'
if ( [eAcute compare:capitalEAcute
options:NSCaseInsensitiveSearch | NSLiteralSearch]
!= NSOrderedSame)
{
NSLog(@"'é' is equivalent to 'É'");
}
comparing substrings
If you only want to compare parts of two NSString
objects,
you can use -compare:options:range:
method and specify an
NSRange
structure. The NSRange
structure is
composed of two parts: a starting location field named loc
and a length field named len
. Usually it's convenient to
use the NSMakeRange()
function to generate the
NSRange
.
// compare substrings
NSString *s1 = @"foo";
NSString *s2 = @"fooey";
if ( [s1 compare:s2
options:0
range:MakeRange(0, 3)] == NSOrderedSame)
{
NSLog(@"both strings start with 'foo'");
}
You pass in zero for the options to use the default comparison.
-compare:options:range:
is similar to
strncmp()
with one important difference: the
NSRange
you give must fall completely inside the receiver
(the first string) or an NSRangeException
will be thrown.
comparing using a specific locale
By default, the -compare:
methods use the current locale
to determine the ordering of two strings. The current locale is
controlled by the user when they set their language and region for
their iOS device. Most of the time you should respect the user's
settings, but sometimes it's appropriate to compare strings using a
fixed locale. Perhaps your app teaches French vocabulary and you want
your French word list to sort in standard French order whether the
user's phone is set to English, German or Japanese. In French,
accented letters at the end of a word sort before accented letters
earlier in a word, thus "coté" should come before "côte". If you use
the default locale, the result of comparing "coté" and "côte" varies
but will probably not give you the correct ordering.
// compare using default locale
NSString *coteAcute = @"cot\u00e9"; // "coté"
NSString *coteCircumflex = @"c\u00f4te"; // "côte"
if ( [coteAcute compare:coteCircumflex] == NSOrderedAscending) {
NSLog(@"Not using a French locale");
}
To remedy this, you can set the locale explicitly when you do your
comparison:
// compare using specific locale
NSLocale *frenchLocale = [[[NSLocale alloc] initWithLocaleIdentifier:@"fr_FR"] autorelease];
NSComparisonResult comparisonResult = [coteAcute compare:coteCircumflex
options:0
range:NSMakeRange(0, 4)
locale:frenchLocale];
if (comparisonResult == NSOrderedDescending) {
NSLog(@"Using a French locale");
}
That sums up the options for comparing C strings and
NSString
s. Next time, we'll look at
slicing
and dicing strings by creating substrings.