Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find different word between two strings

I have a large list of phrases such as

"Nola jumped off the cliff"
"Loroy jumped off the cliff"
"Nola jumped off the couch"
"Leroy lept off the couch"

I need to find each point in a phrase that is a different word and add that word to a node, which is a list of words that can be used in that position in a phrase. So we would end up with.

"Node1(1) Node2(1) off the Node3(1)"
"Node1(2) Node2(1) off the Node3(1)"
...etc

Where node 1 represents a list of the names(Nola,Leroy), node2 represents a list of the actions(jumped,lept) and node3 ends up representing the list of locations(cliff,couch)

The idea is to take a list of the phrases, and have it automatically create the nodes and fill it with the words that can be used at that node in a phrase.

So, 1st how would I generate the list of phrase nodes? I haven't been able to figure out how to compare two sentences and see if they are exactly alike minus one word.

2nd once I have the nodes set up, what would be the best way to compare all the combinations of the nodes to come up with new matches? (hope that made sense)

like image 861
SpectralEdge Avatar asked Mar 01 '12 20:03

SpectralEdge


2 Answers

Nice one, I like it. Since you tagged your question with C#, I wrote the answer also in C#.

A fast way to get the different words between two phrases:

string phrase1 = "Nola jumped off the cliff";
string phrase2 = "Juri jumped off the coach";

//Split phrases into word arrays
var phrase1Words = phrase1.Split(' ');
var phrase2Words = phrase2.Split(' ');

//Find the intersection of the two arrays (find the matching words)
var wordsInPhrase1and2 = phrase1Words.Intersect(phrase2Words);

//The number of words that differ 
int wordDelta = phrase1Words.Count() - wordsInPhrase1and2.Count();

//Find the differing words
var wordsOnlyInPhrase1 = phrase1Words.Except(wordsInPhrase1and2);
var wordsOnlyInPhrase2 = phrase2Words.Except(wordsInPhrase1and2);

Instead of matching the elements yourself by looping over and checking each element, you can save yourself time and use the built-in LINQ functions Intersect, Except, etc...

For creating phrases by random, please refer to the answer of NominSim.

like image 105
hotS85 Avatar answered Oct 31 '22 07:10

hotS85


Yet another Linq-based solution that generates all possible combinations:

var phrases = new List<string> {
           "Nola jumped off the cliff",
           "Loroy jumped off the cliff",
           "Nola jumped off the couch",
           "Leroy lept off the couch"
                           };

var sets = (from p in phrases
            from indexedWord in p.Split(' ').Select((word,idx) => new {idx,word})
            group indexedWord by indexedWord.idx into g
            select g.Select(e => e.word).Distinct()).ToArray();


var allCombos = from w1 in sets[0]
                from w2 in sets[1]
                from w3 in sets[2]
                from w4 in sets[3]
                from w5 in sets[4]
                select String.Format("{0} {1} {2} {3} {4}.", w1, w2, w3, w4, w5);

Doesn't make for the most readable code, but was fun writing. =)

like image 1
afrischke Avatar answered Oct 31 '22 07:10

afrischke