Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Word Stemming in iOS - Not working for single word

I am using NSLinguisticTagger for word stemming. I am able to get a stem words of words in a sentence, but not able to get a stem word for a single word.

Following is the code I am using,

    NSString *stmnt = @"i waited";
    NSLinguisticTaggerOptions options = NSLinguisticTaggerOmitWhitespace | NSLinguisticTaggerOmitPunctuation | NSLinguisticTaggerJoinNames;

    NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:@[NSLinguisticTagSchemeLemma] options:options];
    tagger.string = stmnt;
    [tagger enumerateTagsInRange:NSMakeRange(0, [stmnt length]) scheme:NSLinguisticTagSchemeLemma options:options usingBlock:^(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop) {
        NSString *token = [stmnt substringWithRange:tokenRange];
        NSLog(@"%@: %@", token, tag);
    }];

For this I am getting out correctly as:

i: i
waited: wait

But the above code fails to identify stem word if stmnt = @"waited";

Any help is greatly appreciated

like image 892
Ab'initio Avatar asked Jun 25 '14 07:06

Ab'initio


3 Answers

Following code worked for me,

NSString *stmt = @"waited";
NSRange stringRange = NSMakeRange(0, stmt.length);
NSDictionary* languageMap = @{@"Latn" : @[@"en"]};
[stmt enumerateLinguisticTagsInRange:stringRange
                                       scheme:NSLinguisticTagSchemeLemma
                                      options:NSLinguisticTaggerOmitWhitespace
                                  orthography:[NSOrthography orthographyWithDominantScript:@"Latn" languageMap:languageMap]
                                   usingBlock:^(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop) {
                                       // Log info to console for debugging purposes
                                       NSString *currentEntity = [stmt substringWithRange:tokenRange];
                                       NSLog(@"%@ is a %@, tokenRange (%d,%d)",currentEntity,tag,tokenRange.length,tokenRange.location);
                                   }];
like image 188
Ab'initio Avatar answered Oct 06 '22 00:10

Ab'initio


The accepted answer converted to Swift for those who need it:

    let stmt = "waited"
    let options: NSLinguisticTaggerOptions = .OmitWhitespace
    let stringRange = NSMakeRange(0, stmt.length)
    let languageMap = ["Latn":["en"]]
    let orthography = NSOrthography(dominantScript: "Latn", languageMap: languageMap)

    stmt.enumerateLinguisticTagsInRange(
        stringRange,
        scheme: NSLinguisticTagSchemeLemma,
        options: options,
        orthography: orthography)
        { (tag, tokenRange, sentenceRange, _) -> () in
            let currentEntity = stmt.substringWithRange(tokenRange)
            println(">\(currentEntity):\(tag)")
    }
like image 32
Craig Grummitt Avatar answered Oct 06 '22 00:10

Craig Grummitt


It doesn't work for single word, because there isn't enough information to determine its role in the sentence.

In our case, when user enters single word into our natural language parser, we assume it's a name of a thing, and thus a noun.

So we just construct a sentence where it's implied that the entered word is a noun like so:

let str = "please show me \(word)"

Then just run it through NSLinguisticTagger as usual.

like image 37
Vojto Avatar answered Oct 06 '22 00:10

Vojto