How to access/use Google's pre-trained Word2Vec model without manually downloading the model?

Question

I want to analyse some text on a Google Compute server on Google Cloud Platform (GCP) using the Word2Vec model.

However, the un-compressed word2vec model from https://mccormickml.com/2016/04/12/googles-pretrained-word2vec-model-in-python/ is over 3.5GB and it will take time to download it manually and upload it to a cloud instance.

Is there any way to access this (or any other) pre-trained Word2Vec model on a Google Compute server without uploading it myself?

alvas · Accepted Answer

Alternative to manually downloading stuff, you can use the pre-packaged version (third-party not from Google) on Kaggle dataset.

First sign up for Kaggle and get the credentials https://github.com/Kaggle/kaggle-api#api-credentials

Then, do this on the command line:

pip3 install kaggle
mkdir -p /content/.kaggle/
echo '{"username":"****","key":"****"}' > $HOME/.kaggle/kaggle.json
chmod 600 /root/.kaggle/kaggle.json
kaggle datasets download alvations/vegetables-google-word2vec
unzip $HOME/content/vegetables-google-word2vec.zip

Finally, in Python:

import pickle 
import numpy as np
import os

home = os.environ["HOME"]
embeddings = np.load(os.path.join(home, 'content/word2vec.news.negative-sample.300d.npy'))
with open(os.path.join(home, 'content/word2vec.news.negative-sample.300d.txt')) as fp:
    tokens = [line.strip() for line in fp]
embeddings[tokens.index('hello')]

Full example on Colab: https://colab.research.google.com/drive/178WunB1413VE2SHe5d5gc0pqAd5v6Cpl

P/S: To access other pre-packed word embeddings, see https://github.com/alvations/vegetables

Fra · Answer

You can also use Gensim to download them through the downloader api:

import gensim.downloader as api
path = api.load("word2vec-google-news-300", return_path=True)
print(path)

or from the command line:

python -m gensim.downloader --download <dataname> # same as api.load(dataname, return_path=True)

for a list of available datasets check: https://github.com/RaRe-Technologies/gensim-data

How to access/use Google's pre-trained Word2Vec model without manually downloading the model?

Tags:

python

google-compute-engine

google-cloud-platform

nlp

word2vec

Scott Vinay

2 Answers

alvas

Fra

Recent Activity

Donate For Us

How to access/use Google's pre-trained Word2Vec model without manually downloading the model?

Tags:

python

google-compute-engine

google-cloud-platform

nlp

word2vec

Scott Vinay

2 Answers

alvas

Fra

Related questions

Recent Activity

Donate For Us