Using WordNet to determine semantic similarity between two texts?

Question

How can you determine the semantic similarity between two texts in python using WordNet?

The obvious preproccessing would be removing stop words and stemming, but then what?

The only way I can think of would be to calculate the WordNet path distance between each word in the two texts. This is standard for unigrams. But these are large (400 word) texts, that are natural language documents, with words that are not in any particular order or structure (other than those imposed by English grammar). So, which words would you compare between texts? How would you do this in python?

inspectorG4dget · Accepted Answer

One thing that you can do is:

Kill the stop words
Find as many words as possible that have maximal intersections of synonyms and antonyms with those of other words in the same doc. Let's call these "the important words"
Check to see if the set of the important words of each document is the same. The closer they are together, the more semantically similar your documents.

There is another way. Compute sentence trees out of the sentences in each doc. Then compare the two forests. I did some similar work for a course a long time ago. Here's the code (keep in mind this was a long time ago and it was for class. So the code is extremely hacky, to say the least).

Hope this helps

Using WordNet to determine semantic similarity between two texts?

Tags:

python

nlp

nltk

wordnet

semantic-analysis

Zach

1 Answers

inspectorG4dget

Recent Activity

Donate For Us

Using WordNet to determine semantic similarity between two texts?

Tags:

python

nlp

nltk

wordnet

semantic-analysis

Zach

1 Answers

inspectorG4dget

Related questions

Recent Activity

Donate For Us