Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Objective-C: NSLinguisticTagger "new york" vs "New York"

I just started playing around with NSLinguisticTagger basing my code on this blog: NSLinguisticTagger @ NSHipster.com

NSLinguisticTaggerOptions options = NSLinguisticTaggerOmitWhitespace | NSLinguisticTaggerOmitPunctuation | NSLinguisticTaggerJoinNames;
NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes: [NSLinguisticTagger availableTagSchemesForLanguage:@"en"] options:options];
tagger.string = question;
[tagger enumerateTagsInRange:NSMakeRange(0, [question length]) scheme:NSLinguisticTagSchemeNameTypeOrLexicalClass options:options usingBlock:^(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop) {
NSString *token = [question substringWithRange:tokenRange];
NSLog(@"%@: %@", token, tag); }];

When I run this with question = @"Weekend in New York", "New York" gets tagged as PlaceName which is great. But when I run this with question = @"Weekend in new york", "new" gets tagged as "Adjective" and "york" gets tagged as PlaceName. Is there any way to get around this such that "New York" and "new york" both get tagged as PlaceName?

I'm totally new to this linguistics thing.

like image 544
Ayaka Nonaka Avatar asked Feb 18 '13 14:02

Ayaka Nonaka


2 Answers

Taking this topic a little further. Correct capitalization of first name and last name is a requirement for the NSLinguisticTagger to identify names.

After several hours of frustration, I decided to create various tests with uppercase, lowercase and capitalized-case words.

The NSLinguisticTagger had different results in almost all tests

When the NSLinguisticTagger parses a string in capitalized-case almost all nouns are tagged as personalName. wtf?

It was very frustrating.

The lesson I want to share is that the NSLinguistic tagger can guess at the tags it places on words, but in the end it is just a grammatical evaluation of words. The evaluation depends on proper language constructs such as word placement and whether the word is capitalized or not.

I am still finding it a useful class, but the moral of this post is to "Be Proper".

When parsing text sometimes we programmers have a tendency to play with uppercasing and lowercasing to simplify our work. We can still do this, but just keep in mind that word casing does change the NSLinguisticTagger results.

like image 104
Michael Colon Avatar answered Oct 20 '22 15:10

Michael Colon


This has already been mentioned in the comments, but wanted to point this out anyway. NSLinguisticTagger believes that "New York" and "new york" are different - because they are. The capital N tells it that it's a proper noun. To my knowledge, there is nothing in NSLinguisticTagger that can change this behavior.

However, what you can do is rely on iOS autocorrect. Just make sure that the text field where the value is being entered has autocorrect enabled, and it should automatically correct "new york" to "New York", and similar occurrences. If autocorrect doesn't catch this, then I would try to find some other library for linguistic analysis.

Retroactive autocorrect is already included in iOS (to a certain extend), so that should be good enough to correct "new york" to "New York". If you want to correct the whole sentence (i.e. "weekend in new york" to "Weekend in New York"), you would need to implement that functionality yourself. This shouldn't be terribly difficult, as there are just a few simple grammar rules you must follow, and many things will be picked up by autocorrect.

Hope this helps, let me know if you need more information.

like image 31
futurevilla216 Avatar answered Oct 20 '22 13:10

futurevilla216