Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NLP: any easy and good methods to find semantic similarity between words?

Tags:

I don't know whether StackOverflow covers NLP, so I am gonna give this a shot. I am interested to find the semantic relatedness of two words from a specific domain, i.e. "image quality" and "noise". I am doing some research to determine if reviews of cameras are positive or negative for a particular attribute of the camera. (like image quality in each one of the reviews).

However, not everybody uses the exact same wording "image quality" in the posts, so I am out to see if there is a way for me to build something like that:

"image quality" which includes ("noise", "color", "sharpness", etc etc) so I can wrap all everything within one big umbrella.

I am doing this for another language, so Wordnet is not necessarily helpful. And no, I do not work for Google or Microsoft so I do not have data from people's clicking behaviour as input data either.

However, I do have a lot of text, pos-tagged, segmented etc.

like image 449
sadawd Avatar asked Mar 14 '10 06:03

sadawd


People also ask

How do you find the semantic similarity between two words?

To calculate the semantic similarity between words and sentences, the proposed method follows an edge-based approach using a lexical database. The methodology can be applied in a variety of domains. The methodology has been tested on both benchmark standards and mean human similarity dataset.

How is semantic similarity measured?

To calculate the similarity of two words, the information content of the most informative subsume is used. This measure provides us with information such as the size of the corpus; a large corpus numerical value indicates a large corpus.

How do you calculate similarity in NLP?

Mathematically, you can calculate the cosine similarity by taking the dot product between the embeddings and dividing it by the multiplication of the embeddings norms, as you can see in the image below. In python, you can use the cosine_similarity function from the sklearn package to calculate the similarity for you.


1 Answers

Check out google similarity distance - http://arxiv.org/abs/cs.CL/0412098 eg. if lots of webpages include them both, theyre probably related.

demo program at http://mechanicalcinderella.com

Other than that, you could try to translate a project like wordnet ((google translate could help), or start a collaborative ontology.

like image 86
Sweet Burlap Avatar answered Jan 01 '23 08:01

Sweet Burlap