I'm writing something that takes a block of text and breaks it down into possible database queries that could be used to find similar blocks of text. (something similar to the "similar questions" list being generated while I type this) The basic process:
Here's what I have so far:
//baseList starts with an empty array
//candList starts with the array of unique stems
//target is where the arrays of unique combinations are stored
function createUniqueCombos(baseList,candList,target){
for(var i=0;i<candList.length;i++){
//copy the base List
var newList = baseList.slice(0);
//add the candidate list item to the base list copy
newList.push(candList[i]);
//add the new array to the target array
target.push(newList);
//re-call function using new array as baseList
//and remaining candidates as candList
var nextCandList = candList.slice(i + 1);
createUniqueCombos(newList,nextCandList,target);
}
}
This works, but on blocks of text larger than 25 words or so, it crashes my browser. I realize that mathematically there could be a huge number of possible combinations. What I'd like to know is:
I think your logic is fundamentally flawed because of how many combinations you're creating.
An approach I'd take would be;
split_words
)blocks
) which has columns block_id
and word
Have a SQL query such as
SELECT block_id FROM blocks
WHERE word IN (split_words) GROUP BY block_id
ORDER BY COUNT(*) DESC
and then you'll have a list of block_ids
which are ordered depending on how many words in common the blocks have.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With