Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Syntactic similarity/distance between 2 sentences/string/text using nltk [duplicate]

I have 2 texts as below

Text1 : John likes apple

Text2 : Mike hates orange

If you check above 2 texts, both of them are similar syntactically but semantically have a different meaning.

I want to find

1) Syntactic distance between 2 texts

2) Semantic distance between 2 texts

Is there any way to do this using nltk, as I am newbie to NLP?

like image 374
Ganesh Deshvini Avatar asked Aug 16 '16 13:08

Ganesh Deshvini


2 Answers

Yes, But not limited to nltk. One way that use for syntactic distance, is Part Of Speech tagging(POS Tagging) that map each word of sentence to a specific tag: https://en.wikipedia.org/wiki/Part-of-speech_tagging

For example it map your sentences to these:
Text1: Noun Verb Noun
Text2: Noun Verb Noun

Then you can measure the distance of these two sentences.


And for semantic, you need semantic word net and find synonyms for each word of the sentence, then try to find the intersection of synonyms of words in each sentence

like image 186
Masoud Avatar answered Sep 28 '22 07:09

Masoud


For the semantic, you might want to try word2vec. You can safely average the similarity of words within the sentence or you can come up with your own way to weigh the words according to its syntax.

from gensim.models import Word2Vec

model = Word2Vec.load(path/to/your/model)

model.similarity('apple', 'orange')
like image 29
aerin Avatar answered Sep 28 '22 06:09

aerin