How to calculate the distance in meaning of two words in Python

Question

I am wondering if it's possible to calculate the distance/similarity between two related words in Python (like "fraud" and "steal"). These two words are not synonymous per se but they are clearly related. Are there any concepts/algorithms in NLP that can show this relationship numerically? Maybe via NLTK?

I'm not looking for the Levenshtein distance as that relates to the individual characters that make up a word. I'm looking for how the meaning relates.

Would appreciate any help provided.

Dlamini · Accepted Answer

My suggestion is as follows:

Put each word through the same thesaurus, to get a list of synonyms.
Get the size of the set of similar synonyms for the two words.
That is a measure of similarity between the words.

If you would like to do a more thorough analysis:

Also get the antonyms for each of the two words.
Get the size of the intersection of the sets of antonyms for the two words.

If you would like to go further!...

Put each word through the same thesaurus, to get a list of synonyms.
Use the top n (=5, or whatever) words from the query result to initiate a new query.
Repeat this to a depth you feel is adequate.
Make a collection of synonyms from the repeated synonym queries.
Get the size of the set of similar synonyms for the two words from the two collections of synonyms.
That is a measure of similarity between the words.

How to calculate the distance in meaning of two words in Python

Tags:

python

nlp

nltk

bhat557

1 Answers

Dlamini

Recent Activity

Donate For Us

How to calculate the distance in meaning of two words in Python

Tags:

python

nlp

nltk

bhat557

1 Answers

Dlamini

Related questions

Recent Activity

Donate For Us