Say, I have vocabulary (around 100000 words) and word ("inputstring"). So:
I need generate all words from "inputstring" like "input", "string", "put", "strinpg" etc. And then I need check out them in the my vocabulary. Could you say any good algorithm for that? Because I have only idea with:
NSPredicates
for filter them in the my vocabulary.I tried with NSRegularExpression
since CoreData & NSPredicate
seems to manage them, but I couldn't have a working solution (maybe linked to my no expertise in Regex, but could be a lead). I tried also with NSCharacterSet
, but it couldn't say it the number of occurrences was correct..
This maybe not the more sexy way to do it, but, here what you could do:
NSString *searchedWord = @"inputString";
NSPredicate *predicate = [NSPredicate predicateWithBlock:^BOOL(NSString *evaluatedObject, NSDictionary *bindings) {
for (NSUInteger index = 0; index < [evaluatedObject length]; index++)
{
NSString *subString = [evaluatedObject substringWithRange:NSMakeRange(index, 1)];
NSUInteger numberOfOccurrencesInSearchWord = [self occurrencesOfSubString:subString inString:searchedWord];
NSUInteger numberOfOccurrencesInCurrentWord = [self occurrencesOfSubString:subString inString:evaluatedObject];
if (numberOfOccurrencesInCurrentWord > numberOfOccurrencesInSearchWord)
return FALSE;
}
return TRUE;
}];
//Apply this predicate to your fetch
I put occurrencesOfSubString:inString:
in the class, but it could be a Category on NSString
for example. You could also loop with rangeOfString:option:range
if you prefer them to NSRegularExpression
. Source of the code (slightly modified)
-(NSUInteger)occurrencesOfSubString:(NSString *)subString inString:(NSString *)string
{
NSUInteger numberOfMatches = 0;
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:subString
options:NSRegularExpressionCaseInsensitive error:&error];
if (!error)
numberOfMatches = [regex numberOfMatchesInString:string options:0 range:NSMakeRange(0, [string length])];
return numberOfMatches;
}
Note: To avoid too much loops, you may want to strip the evaluatedObject
in order to not check repeated value.
For example, if evaluatedObject = @"aaa"
, it will look 3 times for "a". So removing duplicated values in it may improve the speed. Here is a solution.
So the code would be in the predicate block:
NSString *evaluatedWithoutRepeat = [evaluatedObject removeDuplicatedCharacters];
for (NSUInteger index = 0; index <= [evaluatedWithoutRepeat length]; index ++)
{
NSString *subString = [evaluatedWithoutRepeat substringWithRange:NSMakeRange:(index,1)];
//The rest would be the same.
}
WorkingTest:
NSArray *testValues = @[@"inputString",
@"input",
@"string",
@"put",
@"strinpg",
@"Stringpg",
@"stringNOTWANTED"];
NSLog(@"AllValues: %@", testValues);
NSLog(@"Test: %@", [testValues filteredArrayUsingPredicate:predicate]);
Output:
> AllValues: (
inputString,
input,
string,
put,
strinpg,
Stringpg,
stringNOTWANTED
)
> Test: (
inputString,
input,
string,
put,
strinpg
)
You're on the right track with NSPredicate
. And the phase you're looking for is fault tolerant
search and it solved by Levenshtein distance. What you basically need to do is make an ||
combination with queries in single queries.
Let's assume you have all your words in NSArray
. You need to call a method filteredArrayUsingPredicate:
on it, BUT it's not be super easy to build a predicate like that.
So your requirements are:
First part are pretty easy, all you need to do is put CONTAINS
to your predicate. The second part should be like ?tring or s?ring or st?ing...
and can easily be build with simple for
. You can experiment with various number of ?
signs and see what matches your criteria.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With