A particular natural language practice is to calculate the similarity between two words using WordNet. I start my question with the following python code:
from nltk.corpus import wordnet
sport = wordnet.synsets("sport")[0]
badminton = wordnet.synsets("badminton")[0]
print(sport.wup_similarity(badminton))
We will get 0.8421
Now what if I look for "haha" and "lol" as following:
haha = wordnet.synsets("haha")
lol = wordnet.synsets("lol")
print(haha)
print(lol)
We will get
[]
[]
Then we cannot consider the similarity between them. What can we do in this case?
You can create a semantic space from cooccurrence matrices using a tool like Dissect (DIStributional SEmantics Composition Toolkit) and then you are set to measure semantic similarity between words or phrases (if you compose words).
In your case for ha and lol you'll need to collect those cooccurrences.
Another thing to try is word2vec.
There are two possible other ways:
CBOW: continuous bag of word
skip gram model: This model is vice versa of CBOW model
look at this: https://www.quora.com/What-are-the-continuous-bag-of-words-and-skip-gram-architectures-in-laymans-terms
These model are well represted here: https://www.tensorflow.org/tutorials/word2vec, also GENSIM is a good python library for doing such these things
Try to look for Tensorflow Solutions, For example this: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/word2vec/word2vec_basic.py
Or try to look for word2vec: https://en.wikipedia.org/wiki/Word2vec
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With