Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

iOS - Most efficient way to find word occurrence count in a string

Given a string, I need to obtain a count of each word that appears in that string. To do so, I extracted the string into an array, by word, and searched that way, but I have the feeling that searching the string directly is more optimal. Below is the code that I originally wrote to solve the problem. I'm up for suggestions on better solutions though.

NSMutableDictionary *sets = [[NSMutableDictionary alloc] init];

NSString *paragraph = [[NSString alloc] initWithContentsOfFile:[[NSBundle mainBundle] pathForResource:@"text" ofType:@"txt"] encoding:NSUTF8StringEncoding error:NULL];

NSMutableArray *words = [[[paragraph lowercaseString] componentsSeparatedByString:@" "] mutableCopy];

while (words.count) {
    NSMutableIndexSet *indexSet = [[NSMutableIndexSet alloc] init];
    NSString *search = [words objectAtIndex:0];
    for (unsigned i = 0; i < words.count; i++) {
        if ([[words objectAtIndex:i] isEqualToString:search]) {
            [indexSet addIndex:i];
        }
    }
    [sets setObject:[NSNumber numberWithInt:indexSet.count] forKey:search];
    [words removeObjectsAtIndexes:indexSet];
}

NSLog(@"%@", sets);

Example:

Starting string:
"This is a test. This is only a test."

Results:

  • "This" - 2
  • "is" - 2
  • "a" - 2
  • "test" - 2
  • "only" - 1
like image 573
RyJ Avatar asked Nov 13 '12 18:11

RyJ


1 Answers

This is exactly what an NSCountedSet is for.

You need to break the string apart into words (which iOS is nice enough to give us a function for so that we don't have to worry about punctuation) and just add each of them to the counted set, which keeps track of the number of times each object appears in the set:

NSString     *string     = @"This is a test. This is only a test.";
NSCountedSet *countedSet = [NSCountedSet new];

[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
                           options:NSStringEnumerationByWords | NSStringEnumerationLocalized
                        usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){

                            // This block is called once for each word in the string.
                            [countedSet addObject:substring];

                            // If you want to ignore case, so that "this" and "This" 
                            // are counted the same, use this line instead to convert
                            // each word to lowercase first:
                            // [countedSet addObject:[substring lowercaseString]];
                        }];

NSLog(@"%@", countedSet);

// Results:  2012-11-13 14:01:10.567 Testing App[35767:fb03] 
// <NSCountedSet: 0x885df70> (a [2], only [1], test [2], This [2], is [2])
like image 155
lnafziger Avatar answered Oct 26 '22 09:10

lnafziger