I'm trying to find out which would be the most optimal way of intersection a set of texts and find the common words in them. Given this scenario:
var t1 = 'My name is Mary-Ann, and I come from Kansas!';
var t2 = 'John, meet Mary, she comes from far away';
var t3 = 'Hi Mary-Ann, come here, nice to meet you!';
intersection result should be:
var result =["Mary"];
It should be able to ignore punctuation marks like .,!?-
Would a solution with regular expressions be optimal?
Here's a tested solution :
function intersect() {
var set = {};
[].forEach.call(arguments, function(a,i){
var tokens = a.match(/\w+/g);
if (!i) {
tokens.forEach(function(t){ set[t]=1 });
} else {
for (var k in set){
if (tokens.indexOf(k)<0) delete set[k];
}
}
});
return Object.keys(set);
}
This function is variadic, you can call it with any number of texts :
console.log(intersect(t1, t2, t3)) // -> ["Mary"]
console.log(intersect(t1, t2)) // -> ["Mary", "from"]
console.log(intersect()) // -> []
If you need to support non English languages, then this regex won't be enough because of the poor support of Unicode in JavaScript regexes. Either you use a regex library or you define your regex by explicitly excluding characters as in a.match(/[^\s\-.,!?]+/g);
(this will probably be enough for you) .
Detailed explanation :
The idea is to fill a set with the tokens of the first text and then remove from the set the tokens missing in the other texts.
Object.create(null)
to avoid a prototype, I like the simplicity of {}
.arguments
instead of defining the passed texts as explicit arguments.arguments
isn't a real array, so to iterate over it you need either a for
loop or a trick like [].forEach.call
. It works because arguments
is "array-like".match
to match words, nothing special here (see note above regarding better support of other languages, though)!i
to check if it's the first text. In that case, I simply copy the tokens as properties in the set. A value must be used, I use 1
. In the future, ES6 sets will make the intent more obvious here.tokens.indexOf(k)<0
)Object.keys
.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With