Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Import GoogleNews-vectors-negative300.bin

Tags:

I am working on code using the gensim and having a tough time troubleshooting a ValueError within my code. I finally was able to zip GoogleNews-vectors-negative300.bin.gz file so I could implement it in my model. I also tried gzip which the results were unsuccessful. The error in the code occurs in the last line. I would like to know what can be done to fix the error. Is there any workarounds? Finally, is there a website that I could reference?

Thank you respectfully for your assistance!

import gensim from keras import backend from keras.layers import Dense, Input, Lambda, LSTM, TimeDistributed from keras.layers.merge import concatenate from keras.layers.embeddings import Embedding from keras.models import Mode  pretrained_embeddings_path = "GoogleNews-vectors-negative300.bin" word2vec =  gensim.models.KeyedVectors.load_word2vec_format(pretrained_embeddings_path,  binary=True)  --------------------------------------------------------------------------- ValueError                                Traceback (most recent call last) <ipython-input-3-23bd96c1d6ab> in <module>()   1 pretrained_embeddings_path = "GoogleNews-vectors-negative300.bin" ----> 2 word2vec =  gensim.models.KeyedVectors.load_word2vec_format(pretrained_embeddings_path,  binary=True)  C:\Users\green\Anaconda3\envs\py35\lib\site- packages\gensim\models\keyedvectors.py in load_word2vec_format(cls, fname,  fvocab, binary, encoding, unicode_errors, limit, datatype) 244                             word.append(ch) 245                     word = utils.to_unicode(b''.join(word),  encoding=encoding, errors=unicode_errors) --> 246                     weights = fromstring(fin.read(binary_len),  dtype=REAL) 247                     add_word(word, weights) 248             else:  ValueError: string size must be a multiple of element size 
like image 498
Green Avatar asked Sep 26 '17 18:09

Green


People also ask

What is Googlenews vectors negative300 bin?

bin. It's a pre-trained word2vec model by google for sentiment analysis.


1 Answers

The below commands work.

brew install wget  wget -c "https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz" 

This downloads the GZIP compressed file that you can uncompress using:

gzip -d GoogleNews-vectors-negative300.bin.gz 

You can then use the below command to get wordVector.

from gensim import models  w = models.KeyedVectors.load_word2vec_format(     '../GoogleNews-vectors-negative300.bin', binary=True) 
like image 66
ohsoifelse Avatar answered Oct 12 '22 11:10

ohsoifelse