I'm writing something that takes a block of text and breaks it down into possible database queries that could be used to find similar blocks of text. (something similar to the "similar questions" list being generated while I type this) The basic process:
Here's what I have so far:
    //baseList starts with an empty array
    //candList starts with the array of unique stems
    //target is where the arrays of unique combinations are stored
    function createUniqueCombos(baseList,candList,target){
    for(var i=0;i<candList.length;i++){         
        //copy the base List
        var newList = baseList.slice(0);
        //add the candidate list item to the base list copy
        newList.push(candList[i]);
        //add the new array to the target array
        target.push(newList);   
        //re-call function using new array as baseList
        //and remaining candidates as candList
        var nextCandList = candList.slice(i + 1);       
        createUniqueCombos(newList,nextCandList,target);
    }
}
This works, but on blocks of text larger than 25 words or so, it crashes my browser. I realize that mathematically there could be a huge number of possible combinations. What I'd like to know is:
I think your logic is fundamentally flawed because of how many combinations you're creating.
An approach I'd take would be;
split_words)blocks) which has columns block_id and word
Have a SQL query such as
SELECT block_id FROM blocks 
WHERE word IN (split_words) GROUP BY block_id 
ORDER BY COUNT(*) DESC
and then you'll have a list of block_ids which are ordered depending on how many words in common the blocks have.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With