Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to search all words from input string in ios?

Say, I have vocabulary (around 100000 words) and word ("inputstring"). So:

I need generate all words from "inputstring" like "input", "string", "put", "strinpg" etc. And then I need check out them in the my vocabulary. Could you say any good algorithm for that? Because I have only idea with:

  1. Recursive search all possible combinations at the step 1
  2. Using NSPredicates for filter them in the my vocabulary.
like image 840
Viktorianec Avatar asked Dec 19 '22 02:12

Viktorianec


2 Answers

I tried with NSRegularExpression since CoreData & NSPredicate seems to manage them, but I couldn't have a working solution (maybe linked to my no expertise in Regex, but could be a lead). I tried also with NSCharacterSet, but it couldn't say it the number of occurrences was correct..

This maybe not the more sexy way to do it, but, here what you could do:

NSString *searchedWord = @"inputString";

NSPredicate *predicate = [NSPredicate predicateWithBlock:^BOOL(NSString *evaluatedObject, NSDictionary *bindings) {
    for (NSUInteger index = 0; index < [evaluatedObject length]; index++)
    {
        NSString *subString = [evaluatedObject substringWithRange:NSMakeRange(index, 1)];

        NSUInteger numberOfOccurrencesInSearchWord = [self occurrencesOfSubString:subString inString:searchedWord];
        NSUInteger numberOfOccurrencesInCurrentWord = [self occurrencesOfSubString:subString inString:evaluatedObject];
        if (numberOfOccurrencesInCurrentWord > numberOfOccurrencesInSearchWord)
            return FALSE;
    }
    return TRUE;
}];

//Apply this predicate to your fetch

I put occurrencesOfSubString:inString: in the class, but it could be a Category on NSString for example. You could also loop with rangeOfString:option:range if you prefer them to NSRegularExpression. Source of the code (slightly modified)

-(NSUInteger)occurrencesOfSubString:(NSString *)subString inString:(NSString *)string
{
    NSUInteger numberOfMatches = 0;
    NSError *error = nil;
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:subString
                                                                           options:NSRegularExpressionCaseInsensitive error:&error];


    if (!error)
        numberOfMatches = [regex numberOfMatchesInString:string options:0 range:NSMakeRange(0, [string length])];

    return numberOfMatches;
}

Note: To avoid too much loops, you may want to strip the evaluatedObject in order to not check repeated value. For example, if evaluatedObject = @"aaa", it will look 3 times for "a". So removing duplicated values in it may improve the speed. Here is a solution. So the code would be in the predicate block:

NSString *evaluatedWithoutRepeat = [evaluatedObject removeDuplicatedCharacters];
for (NSUInteger index = 0; index <= [evaluatedWithoutRepeat length]; index ++)
{
    NSString *subString = [evaluatedWithoutRepeat substringWithRange:NSMakeRange:(index,1)];
    //The rest would be the same.
}

WorkingTest:

NSArray *testValues = @[@"inputString",
                        @"input",
                        @"string",
                        @"put",
                        @"strinpg",
                        @"Stringpg",
                        @"stringNOTWANTED"];
NSLog(@"AllValues: %@", testValues);

NSLog(@"Test: %@", [testValues filteredArrayUsingPredicate:predicate]);

Output:

> AllValues: (
    inputString,
    input,
    string,
    put,
    strinpg,
    Stringpg,
    stringNOTWANTED
)
> Test: (
    inputString,
    input,
    string,
    put,
    strinpg
)
like image 82
Larme Avatar answered Jan 18 '23 10:01

Larme


You're on the right track with NSPredicate. And the phase you're looking for is fault tolerant search and it solved by Levenshtein distance. What you basically need to do is make an || combination with queries in single queries.

Let's assume you have all your words in NSArray. You need to call a method filteredArrayUsingPredicate: on it, BUT it's not be super easy to build a predicate like that.

So your requirements are:

  1. Search word can be a part of larger word
  2. User can misspelled word

First part are pretty easy, all you need to do is put CONTAINS to your predicate. The second part should be like ?tring or s?ring or st?ing... and can easily be build with simple for. You can experiment with various number of ? signs and see what matches your criteria.

like image 26
Jakub Avatar answered Jan 18 '23 10:01

Jakub