Searching In Strings
Last week we looked at
creating
substrings of C strings and NSString
s. Today we look
at another common string operation: searching within a string.
Find a character in a C string
As with all operations on C strings, searching requires you to deal
with pointers. To find the first occurrence of a character in a C
string, use the strchr()
function. If the character is
found, a pointer to that character is returned. If the character isn't
present in the string, NULL
is returned.
// find a character in a C string
char const *s = "foobar";
char const *character = strchr(s, 'b');
if (character) {
NSLog(@"Found b");
} else {
NSLog(@"Didn't find b");
}
// prints "Found b"
As we saw last week when we looked at
substrings,
the pointer returned by strchr()
is effectively a
substring of the source string starting at the first occurrence of the
character you were searching for:
char const *s = "foobar";
char const *substring = strchr(s, 'b');
if (substring) {
NSLog(@"The substring is %s", substring);
}
// prints "The substring is bar"
Once you find the character you're looking for, it's common to want to
create a substring containing everything up to that position
in the string:
char const *filename = "myfile.txt";
char const *dot = strchr(filename, '.');
if (dot) {
size_t length = dot - filename;
char *baseFilename = calloc(length + 1, sizeof(char));
if (baseFilename) {
strncpy(baseFilename, filename, length);
NSLog(@"The base filename is %s");
}
}
// prints "The base filename is myfile"
You use the difference between the two string pointers to calculate the
number of char
s up to (but not including) the character
you searched for. After allocating a buffer to hold the new substring
(and the null terminator), you use the
strncpy()
function to copy the first part of
the source string. Because we called calloc()
, the last
char
in our buffer is already set to zero; if you use
malloc()
or a fixed buffer instead, you need to remember
to set the null terminator since strncpy()
isn't guaranteed to do it for you.
Very often, you want to find the last occurrence of a
character; you can use the strrchr()
function
to search in reverse:
// find a character in reverse
char const *filename = "myfile.txt";
char const *extension = strrchr(filename, '.');
if (extension) {
NSLog(@"The extension is %s", extension);
}
// prints "The extension is .txt"
Find one C string in another
To find the first occurrence of one C string in another, use the
strstr()
function. Like strchr()
, it returns
a pointer to the first occurrence of the string, or NULL
if it wasn't found.
// find one C string in another
char const *s1 = "The quick brown fox";
char const *s2 = strchr(s1, "ick");
if (s2) {
NSLog(@"Found ick");
} else {
NSLog(@"Didn't find ick");
}
// prints "Found ick"
Unfortunately the C standard library doesn't have a
strrstr()
function to search for the last
occurrence of one string in another. You'll need to roll your own by
calling strstr()
in a loop until you reach the end of the
string. (The implementation of this is left as an exercise for the
reader, or better yet convert your C string to an NSString
and keep reading :-)
C string encoding issues
The standard library functions for searching C strings work great with
ASCII and similar single byte encodings. If you need to search inside
UTF-8 encoded C strings, you'll quickly realize that
strchr()
and strrchr()
are only useful for
finding the basic ASCII characters (which are also valid UTF-8
characters). If you need to find non-ASCII characters like 'é', you'll
need to use strstr()
to search for the byte sequence that
UTF-8 uses to represent it ("\xc3\xa9" for 'é'). Even then, Unicode
characters like 'é' can be represented two ways: as the single Unicode
character 'é' or as the base character 'e' followed by the combining
character '´'. In general, it's better to use a C library designed to
deal with the encoding such as the
International Components for Unicode
for handling UTF-8 encoded strings. Or if you're developing for iOS
or Mac OS X, use NSString
instead.
Find one NSString
in another
The NSString
class doesn't have separate methods to search
for a single character or a string; you use -rangeOfString:
to do either:
// find a character in an NSString
NSString *s = @"foobar";
NSRange range = [s rangeOfString:@"b"];
if (range.location != NSNotFound) {
NSLog(@"Found b at %u", range.location);
}
// prints "Found b at 3"
Searching for the last occurrence of a string is done using the related
method -rangeOfString:options:
with the
NSBackwardsSearch
option.
// find last occurrence in an NSString
NSString *s = @"The rain in Spain falls mainly on the plain";
NSRange range = [s rangeOfString:@"ain" options:NSBackwardsSearch];
if (range.location != NSNotFound) {
NSLog(@"Found ain at %u", range.location);
}
// prints "Found ain at 40"
The options are a combination of the following bit flags:
NSCaseInsensitiveSearch
, NSLiteralSearch
,
NSBackwardsSearch
and NSAnchoredSearch
. You
use the bitwise or (|
) operator to combine them together,
or pass in zero for no options.
Use the NSCaseInsensitiveSearch
option to find the first
match, ignoring the case of both strings. The
NSLiteralSearch
option is used when you want to match a
specific Unicode string form, such as the single character 'é' (Unicode
character U+00E9) and not match equivalent character sequences like 'e'
+ '´' (Unicode characters U+0065 and U+0301). Most applications won't
care about this option, but it's really handy when you need it.
NSAnchoredSearch
checks for a match only at the start of
the string (or the end if combined with
NSBackwardsSearch
). This option is occasionally handy,
but the methods -hasPrefix:
and -hasSuffix:
are easier to read equivalents.
// anchored search
NSString *s = @"The rain in Spain falls mainly on the plain";
NSRange range = [s rangeOfString:@"ain"
options:NSAnchoredSearch];
if (range.location == NSNotFound) {
NSLog(@"Doesn't start with ain");
}
// prints "Doesn't start with ain"
// same thing using -hasPrefix:
if ( ! [s hasPrefix:@"ain"]) {
NSLog(@"Doesn't have prefix ain");
}
// prints "Doesn't have prefix ain"
// now from the end
range = [s rangeOfString:@"ain"
options:NSAnchoredSearch | NSBackwardsSearch];
if (range.location != NSNotFound) {
NSLog(@"Ends with ain");
}
// prints "Ends with ain"
// same thing using -hasSuffix:
if ([s hasSuffix:@"ain"]) {
NSLog(@"Has suffix ain");
}
// prints "Has suffix ain"
There are two other variations of -rangeOfString:
. The
first, -rangeOfString:options:range:
, allows you to search
within a section of a larger string without having to create a
substring.
The second, -rangeOfString:options:range:locale:
, allows
you to specify a locale as well as a range. In most cases you want to
use the current locale, which is taken from the language setting on the
user's device. The other variations of -rangeOfString:
use the current locale, and you can pass nil
for the
locale to use the current one. Sometimes you know that the string
contains text in a particular language, in an app that teaches German
for instance. In this case you should specify a locale when searching
the string; the locale can affect how text is matched, especially when
using the NSCaseInsensitiveSearch
option.
Next week, we'll look at
replacing
characters in C strings and NSString
s.