How to run tsne on word2vec created from gensim?

Tags:

I want to visualize a word2vec created from gensim library. I tried sklearn but it seems I need to install a developer version to get it. I tried installing the developer version but that is not working on my machine . Is it possible to modify this code to visualize a word2vec model ?

tsne_python

762

asked Nov 14 '16 02:11

Shakti

1 Answers

You don't need a developer version of scikit-learn - just install scikit-learn the usual way via pip or conda.

To access the word vectors created by word2vec simply use the word dictionary as index into the model:

X = model[model.wv.vocab]

Following is a simple but complete code example which loads some newsgroup data, applies very basic data preparation (cleaning and breaking up sentences), trains a word2vec model, reduces the dimensions with t-SNE, and visualizes the output.

from gensim.models.word2vec import Word2Vec
from sklearn.manifold import TSNE
from sklearn.datasets import fetch_20newsgroups
import re
import matplotlib.pyplot as plt

# download example data ( may take a while)
train = fetch_20newsgroups()

def clean(text):
    """Remove posting header, split by sentences and words, keep only letters"""
    lines = re.split('[?!.:]\s', re.sub('^.*Lines: \d+', '', re.sub('\n', ' ', text)))
    return [re.sub('[^a-zA-Z]', ' ', line).lower().split() for line in lines]

sentences = [line for text in train.data for line in clean(text)]

model = Word2Vec(sentences, workers=4, size=100, min_count=50, window=10, sample=1e-3)

print (model.wv.most_similar('memory'))

X = model.wv[model.wv.vocab]

tsne = TSNE(n_components=2)
X_tsne = tsne.fit_transform(X)

plt.scatter(X_tsne[:, 0], X_tsne[:, 1])
plt.show()

101

answered Sep 19 '22 13:09

goerlitz

Related questions
                            
                                How to forecast in python using machine learning , from a given set of geographical data?
                            
                                Unintended multithreading in Python (scikit-learn)
                            
                                How to preprocess data for machine learning? [closed]
                            
                                Use of 'random_state' parameter in sklearn.utils.shuffle?
                            
                                How to randomly select rows from a data set using pandas?
                            
                                How to visualize an sklearn GradientBoostingClassifier?
                            
                                Unable to transform string column to categorical matrix using Keras and Sklearn
                            
                                How to implement polynomial logistic regression in scikit-learn?
                            
                                How does sklearn random forest index feature_importances_
                            
                                Why does not GridSearchCV give best score ? - Scikit Learn
                            
                                Find the tf-idf score of specific words in documents using sklearn
                            
                                Cross validation for MNIST dataset with pytorch and sklearn
                            
                                Classification tree in sklearn giving inconsistent answers
                            
                                Remove single occurrences of words in vocabulary TF-IDF
                            
                                scipy sparse matrix: remove the rows whose all elements are zero
                            
                                sklearn: calculating accuracy score of k-means on the test data set
                            
                                Scikit-Learn Logistic Regression Memory Error
                            
                                TfidfVectorizer in sklearn how to specifically INCLUDE words
                            
                                scikit cosine_similarity vs pairwise_distances
                            
                                How to get precision, recall and f-measure from confusion matrix in Python [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to run tsne on word2vec created from gensim?

Tags:

scikit-learn

gensim

word2vec

Shakti

People also ask

1 Answers

goerlitz

Recent Activity

Donate For Us