Does anyone know of a good way to calculate the "semantic distance" between two words? Immediately an algorithm that counts the steps between words in a thesaurus springs to mind. <hr> OK, looks like a similar question has already been answered: Is there an algorithm that tells the semantic similarity of two phrases.

In text mining there is an important maxim: "You shall know a word by the company it keeps". It means that it is possible to learn the meaning of a word based on the terms that frequently appear close to it. Without entering in extensive details, let me give two simple options to estimate semantic distance between terms: <ol> <li>Use a resource similar to WordNet (a large lexical database of English). WordNet superficially resembles a thesaurus, in that it groups words together based on their meanings. The semantic distance between words can be estimated as the number of vertices that connect the two words.</li> <li>Using a large corpus (e.g. Wikipedia), count the terms that appear close to the words you are analyzing. Create two vector and compute a distance (e.g cosine).</li> </ol> You can check this materials to get a get picture about the subject: <ol> <li>http://www.saifmohammad.com/WebDocs/Mohammad_Saif_Thesis-slides.pdf</li> <li>http://www.umiacs.umd.edu/~saif/WebDocs/distributionalmeasures.pdf</li> <li>http://www.umiacs.umd.edu/~saif/WebDocs/Measuring-Semantic-Distance.pdf</li> </ol>

The thesaurus idea has some merit. One idea would be to create a graph based on a thesaurus with the nodes being the words and an edge indicating that there they are listed as synonyms in the thesaurus. You could then use a shortest path algorithm to give you the distance between the nodes as a measure of their similarity. One difficulty here is that some words have different meanings in different contexts. Your algorithm may need to take this into account and use directed links with the weight of the outgoing link dependent on the incoming link being followed (or ignore some outgoing links based on the incoming link).

Calculating the semantic distance between words

2 Answers

In text mining there is an important maxim: "You shall know a word by the company it keeps". It means that it is possible to learn the meaning of a word based on the terms that frequently appear close to it.

Without entering in extensive details, let me give two simple options to estimate semantic distance between terms:

Use a resource similar to WordNet (a large lexical database of English). WordNet superficially resembles a thesaurus, in that it groups words together based on their meanings. The semantic distance between words can be estimated as the number of vertices that connect the two words.
Using a large corpus (e.g. Wikipedia), count the terms that appear close to the words you are analyzing. Create two vector and compute a distance (e.g cosine).

You can check this materials to get a get picture about the subject:

http://www.saifmohammad.com/WebDocs/Mohammad_Saif_Thesis-slides.pdf
http://www.umiacs.umd.edu/~saif/WebDocs/distributionalmeasures.pdf
http://www.umiacs.umd.edu/~saif/WebDocs/Measuring-Semantic-Distance.pdf

143

answered Oct 31 '22 04:10

mariolpantunes

The thesaurus idea has some merit. One idea would be to create a graph based on a thesaurus with the nodes being the words and an edge indicating that there they are listed as synonyms in the thesaurus. You could then use a shortest path algorithm to give you the distance between the nodes as a measure of their similarity.

One difficulty here is that some words have different meanings in different contexts. Your algorithm may need to take this into account and use directed links with the weight of the outgoing link dependent on the incoming link being followed (or ignore some outgoing links based on the incoming link).

answered Oct 31 '22 06:10

tvanfosson

Related questions
                            
                                Sort a subset of a python list to have the same relative order as in other list
                            
                                How to break a geometry into blocks?
                            
                                Time complexity analysis for finding the maximum element
                            
                                Big-O notation with two variables
                            
                                Fast way to match strings with typo
                            
                                Find an element in an array, but the element can jump
                            
                                Footprint finding algorithm
                            
                                C++ STL Next Permutation with Combination
                            
                                Time complexity for combination of parentheses
                            
                                Machine learning classifying algorithm with "unknown" class
                            
                                Find a String in a 2 dimensional Array
                            
                                Knapsack with mutually exclusive items
                            
                                Should sorting algorithm pass same element in the comparison function
                            
                                Parallel for_each more than two times slower than std::for_each
                            
                                Companion to hypot()
                            
                                Convert Extended (80-bit) to string
                            
                                Interview Question: Query - which sentences contain all of the words of a phrase
                            
                                Parity of permutation with parallelism
                            
                                How can I speed up the binary GCD algorithm using __builtin_ctz?
                            
                                Fast Text Search Over Logs

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Calculating the semantic distance between words

Tags:

algorithm

Ben Aston

People also ask

2 Answers

mariolpantunes

tvanfosson

Recent Activity

Donate For Us