The story behind
I am creating a voice controlled application using x-webkit-speech
which is surprisingly good (the feature, not my app), but sometimes the user (me) mumbles a bit. It would be nice to accept the command if some reasonable part of the word matches some reasonable part of some reasonable command. So I search for the holy grail called Algorithm of the Greatest Intersect of Word in Set of Words. Could some fresh bright mind drive me out of the cave of despair?
Example
"rotation" in ["notable","tattoo","onclick","statistically"]
should match tattoo because it has the longest intersect with rotation (tat_o). statistically is the second best (tati intersect), because longer part of the word needs to be ignored (but this is bonus condition, it would be acceptable without it).
Notes
What have I tried?
Well, it is pretty embarassing....
for(var i=10; i>=4; --i) // reasonable substring
for(var word in words) // for all words in the set
for(var j=0; j<word.length-i; ++j) // search for any i substring
// aaargh... three levels of abstraction is too much for me
This is an algorithm that seems to work. I have no idea how good it performs compared to other already established algorithms (I suspect it perform worse) but maybe it gives you an idea how you could do it:
FIDDLE
var minInt = 3;
var arr = ["notable","tattoo","onclick","statistically"];
var word = "rotation";
var res = [];
if (word.length >= minInt) {
for (var i = 0; i < arr.length; i++) {
var comp = arr[i];
var m = 0;
if (comp.length >= minInt) {
for (var l = 0; l < comp.length - minInt + word.length - minInt + 1; l++) {
var subcomp = l > word.length - minInt ? comp.substring(l - word.length + minInt) : comp;
var subword = l < word.length - minInt ? word.substring(word.length - minInt - l) : word;
var minL = Math.min(subcomp.length, subword.length);
var matches = 0;
for (var k = 0; k < minL; k++) {
if (subcomp[k] === subword[k]) {
matches++;
}
}
if (matches > m) {
m = matches;
}
}
}
res[i] = m >= minInt ? m : null;
}
}
console.log(res);
What happens is, that it compares the two strings by "moving" on against the other and calculates the matching letters in each position. Here you see the compared "sub"words for rotation vs. notable
:
ion / notable --> one match on index 1
tion / notable --> no match
ation / notable --> no match
tation / notable --> one match on index 2
otation / notable --> no match
rotation / notable --> three matches on index 1,2,3
rotation / otable --> no match
rotation / table --> no match
rotation / able --> no match
rotation / ble --> no match
As you see, the maximum number of matches is 3 and that is what it would return.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With