Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

word2vec - get nearest words

Reading the tensorflow word2vec model output how can I output the words related to a specific word ?

Reading the src : https://github.com/tensorflow/tensorflow/blob/r0.11/tensorflow/examples/tutorials/word2vec/word2vec_basic.py can view how the image is plotted.

But is there a data structure (e.g dictionary) created as part of training the model that allows to access nearest n words closest to given word ? For example if word2vec generated image :

enter image description here

image src: https://www.tensorflow.org/versions/r0.11/tutorials/word2vec/index.html

In this image the words 'to , he , it' are contained in same cluster, is there a function which takes as input 'to' and outputs 'he , it' (in this case n=2) ?

like image 926
blue-sky Avatar asked Oct 16 '16 19:10

blue-sky


People also ask

Can word2vec be used for search?

What is word2vec? This neural network algorithm has a number of interesting use cases, especially for search. In this excerpt from Deep Learning for Search, Tommaso Teofili explains how you can use word2vec to map datasets with neural networks.

How does word2vec measure similarity?

Word2Vec is a model used to represent words into vectors. Then, the similarity value can be generated using the Cosine Similarity formula of the word vector values produced by the Word2Vec model.

How is word2vec trained?

The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence.


1 Answers

This approach apply to word2vec in general. If you can save the word2vec in text/binary file like google/GloVe word vector. Then what you need is just the gensim.

To install:

Via github

Python code:

from gensim.models import Word2Vec

gmodel=Word2Vec.load_word2vec_format(fname)
ms=gmodel.most_similar('good',10)
for x in ms:
    print x[0],x[1]

However this will search all the words to give the results, there are approximate nearest neighbor (ANN) which will give you the result faster but with a trade off in accuracy.

In the latest gensim, annoy is used to perform the ANN, see this notebooks for more information.

Flann is another library for Approximate Nearest Neighbors.

like image 165
Steven Du Avatar answered Oct 20 '22 13:10

Steven Du