Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

semantic similarity between sentences

Tags:

java

nlp

I'm doing a project. I need any opensource tool or technique to find the semantic similarity of two sentences, where I give two sentences as an input, and receive score (i.e.,semantic similarity) as an output. Any help?

like image 290
salma Avatar asked Jan 10 '10 17:01

salma


People also ask

How do you find the semantic similarity between two sentences?

The easiest way of estimating the semantic similarity between a pair of sentences is by taking the average of the word embeddings of all words in the two sentences, and calculating the cosine between the resulting embeddings.

What is semantic text similarity?

Intro. Semantic Similarity, or Semantic Textual Similarity, is a task in the area of Natural Language Processing (NLP) that scores the relationship between texts or documents using a defined metric. Semantic Similarity has various applications, such as information retrieval, text summarization, sentiment analysis, etc.

How do you calculate semantic similarity?

To calculate the semantic similarity between words and sentences, the proposed method follows an edge-based approach using a lexical database. The methodology can be applied in a variety of domains. The methodology has been tested on both benchmark standards and mean human similarity dataset.


1 Answers

Salma, I'm afraid this is not the right forum for your question as it's not directly related to programming. I recommend that you ask your question again on corpora list. You also may want to search their archives first.

Apart from that, your question is not precise enough, and I'll explain what I mean by that. I assume that your project is about computing the semantic similarity between sentences and not about something else to which semantic similarity is just one thing among many. If this is the case, then there are a few things to consider: First of all, neither from the perspective of computational linguistics nor of theoretical linguistics is it clear what the term 'semantic similarity' means exactly. There are numerous different views and definitions of it, all depending on the type of problem to be solved, the tools and techniques which are at hand, and the background of the one approaching this task, etc. Consider these examples:

  1. Pete and Rob have found a dog near the station.
  2. Pete and Rob have never found a dog near the station.
  3. Pete and Rob both like programming a lot.
  4. Patricia found a dog near the station.
  5. It was a dog who found Pete and Rob under the snow.

Which of the sentences 2-4 are similar to 1? 2 is the exact opposite of 1, still it is about Pete and Rob (not) finding a dog. 3 is about Pete and Rob, but in a completely different context. 4 is about find a dog near the station, although the finder being someone else. 5 is about Pete, Rob, a dog, and a 'finding' event but in a different way than in 1. As for me, I would not be able to rank these examples according to their similarity even without having to write a computer program.

In order to compute semantic similarity you need to first decide what you want to be treated as 'semantically similar' and what not. In order to compute semantic similarity on the sentence level, you ideally would compare some kind of meaning representation of the sentences. Meaning representation normally come as logic formula and are extremely complex to generate. However, there are tools which attempt to do this, e.g. Boxer

As a simplistic but often practical approach, you would define semantic similarity as the sum of the similarities between the words in one sentence and the other. This makes the problem a lot easier, although there are still some difficult issues to be addressed since semantic similarity of words is just as badly defined as that of sentences. If you want to get an impression of this, take a look into the book 'Lexical Semantics' by D.A. Cruse (1986). However, there are quite a number of tools and techniques to compute the semantic similarity between word. Some of them define it basically as the negative distance of two words in a taxonomy like Word Net or the Wikipedia taxonomy (see this paper which describes an API for this). Others compute semantic similarity by using some statistical measures calculated over large text corpora. They are based on the insight that similar words occur in similar context. A third approach to calculating semantic similarity between sentences or words is concerned with vector space models which you may know from information retrieval. To get an overview about these latter techniques, take a look at chapter 8.5 in the book Foundations of statistical natural language processing by Manning and Schütze.

Hope this gets you off on your feet for now.

like image 123
ferdystschenko Avatar answered Sep 21 '22 14:09

ferdystschenko