Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate the distance in meaning of two words in Python

Tags:

python

nlp

nltk

I am wondering if it's possible to calculate the distance/similarity between two related words in Python (like "fraud" and "steal"). These two words are not synonymous per se but they are clearly related. Are there any concepts/algorithms in NLP that can show this relationship numerically? Maybe via NLTK?

I'm not looking for the Levenshtein distance as that relates to the individual characters that make up a word. I'm looking for how the meaning relates.

Would appreciate any help provided.

like image 1000
bhat557 Avatar asked Oct 29 '22 09:10

bhat557


1 Answers

My suggestion is as follows:

  • Put each word through the same thesaurus, to get a list of synonyms.
  • Get the size of the set of similar synonyms for the two words.
  • That is a measure of similarity between the words.

If you would like to do a more thorough analysis:

  • Also get the antonyms for each of the two words.
  • Get the size of the intersection of the sets of antonyms for the two words.

If you would like to go further!...

  • Put each word through the same thesaurus, to get a list of synonyms.
  • Use the top n (=5, or whatever) words from the query result to initiate a new query.
  • Repeat this to a depth you feel is adequate.
  • Make a collection of synonyms from the repeated synonym queries.
  • Get the size of the set of similar synonyms for the two words from the two collections of synonyms.
  • That is a measure of similarity between the words.
like image 152
Dlamini Avatar answered Nov 09 '22 10:11

Dlamini