I have 2 texts as below
Text1 : John likes apple
Text2 : Mike hates orange
If you check above 2 texts, both of them are similar syntactically but semantically have a different meaning.
I want to find
1) Syntactic distance between 2 texts
2) Semantic distance between 2 texts
Is there any way to do this using nltk, as I am newbie to NLP?
Yes, But not limited to nltk. One way that use for syntactic distance, is Part Of Speech tagging(POS Tagging) that map each word of sentence to a specific tag: https://en.wikipedia.org/wiki/Part-of-speech_tagging
For example it map your sentences to these:
Text1: Noun Verb Noun
Text2: Noun Verb Noun
Then you can measure the distance of these two sentences.
And for semantic, you need semantic word net and find synonyms for each word of the sentence, then try to find the intersection of synonyms of words in each sentence
For the semantic, you might want to try word2vec. You can safely average the similarity of words within the sentence or you can come up with your own way to weigh the words according to its syntax.
from gensim.models import Word2Vec
model = Word2Vec.load(path/to/your/model)
model.similarity('apple', 'orange')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With