I am playing around with FastText
, https://pypi.python.org/pypi/fasttext,which is quite similar to Word2Vec
. Since it seems to be a pretty new library with not to many built in functions yet, I was wondering how to extract morphological similar words.
For eg: model.similar_word("dog")
-> dogs. But there is no function built-in.
If I type
model["dog"]
I only get the vector, that might be used to compare cosine similarity.
model.cosine_similarity(model["dog"], model["dogs"]])
.
Do I have to make some sort of loop and do cosine_similarity
on all possible pairs in a text? That would take time ...!!!
Although it takes longer time to train a FastText model (number of n-grams > number of words), it performs better than Word2Vec and allows rare words to be represented appropriately.
The biggest benefit of using FastText is that it generate better word embeddings for rare words, or even words not seen during training because the n-gram character vectors are shared with other words. This is something that Word2Vec and GLOVE cannot achieve.
FastText is an open-source, free library from Facebook AI Research(FAIR) for learning word embeddings and word classifications. This model allows creating unsupervised learning or supervised learning algorithm for obtaining vector representations for words.
You can install and import gensim library and then use gensim library to extract most similar words from the model that you downloaded from FastText.
Use this:
import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('model.vec')
similar = model.most_similar(positive=['man'],topn=10)
And by topn parameter you get the top 10 most similar words.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With