Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Objective-C Find the most commonly used words in an NSString

I am trying to write a method:

- (NSDictionary *)wordFrequencyFromString:(NSString *)string {}

where the dictionary returned will have the words and how often they were used in the string provided. Unfortunately, I can't seem to find a way to iterate through words in a string to analyze each one - only each character which seems like a bit more work than necessary. Any suggestions?

like image 247
startuprob Avatar asked Sep 11 '11 19:09

startuprob


2 Answers

NSString has -enumerateSubstringsInRange: method which allows to enumerate all words directly, letting standard api to do all necessary stuff to define word boundaries etc:

[s enumerateSubstringsInRange:NSMakeRange(0, [s length])
                      options:NSStringEnumerationByWords
                   usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
                       NSLog(@"%@", substring);
                   }];

In the enumeration block you can use either NSDictionary with words as keys and NSNumber as their counts, or use NSCountedSet that provides required functionality for counts.

like image 114
Vladimir Avatar answered Nov 29 '22 23:11

Vladimir


You can use componentsSeparatedByCharactersInSet: to split the string and NSCountedSet will count the words for you.

1) Split the string into words using a combination of the punctuation, whitespace and new line character sets:

NSMutableCharacterSet *separators = [NSMutableCharacterSet punctuationCharacterSet];
[separators formUnionWithCharacterSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

NSArray *words = [myString componentsSeparatedByCharactersInSet:separators];

2) Count the occurrences of the words (if you want to disregard capitalization, you can do NSString *myString = [originalString lowercaseString]; before splitting the string into components):

NSCountedSet *frequencies = [NSCountedSet setWithArray:words];
NSUInteger aWordCount = [frequencies countForObject:@"word"]);

If you are willing to change your method signature, you can just return the counted set.

like image 37
albertamg Avatar answered Nov 29 '22 22:11

albertamg