I am working on code using the gensim and having a tough time troubleshooting a ValueError within my code. I finally was able to zip GoogleNews-vectors-negative300.bin.gz file so I could implement it in my model. I also tried gzip which the results were unsuccessful. The error in the code occurs in the last line. I would like to know what can be done to fix the error. Is there any workarounds? Finally, is there a website that I could reference?
Thank you respectfully for your assistance!
import gensim from keras import backend from keras.layers import Dense, Input, Lambda, LSTM, TimeDistributed from keras.layers.merge import concatenate from keras.layers.embeddings import Embedding from keras.models import Mode pretrained_embeddings_path = "GoogleNews-vectors-negative300.bin" word2vec = gensim.models.KeyedVectors.load_word2vec_format(pretrained_embeddings_path, binary=True) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-3-23bd96c1d6ab> in <module>() 1 pretrained_embeddings_path = "GoogleNews-vectors-negative300.bin" ----> 2 word2vec = gensim.models.KeyedVectors.load_word2vec_format(pretrained_embeddings_path, binary=True) C:\Users\green\Anaconda3\envs\py35\lib\site- packages\gensim\models\keyedvectors.py in load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype) 244 word.append(ch) 245 word = utils.to_unicode(b''.join(word), encoding=encoding, errors=unicode_errors) --> 246 weights = fromstring(fin.read(binary_len), dtype=REAL) 247 add_word(word, weights) 248 else: ValueError: string size must be a multiple of element size
bin. It's a pre-trained word2vec model by google for sentiment analysis.
The below commands work.
brew install wget wget -c "https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz"
This downloads the GZIP compressed file that you can uncompress using:
gzip -d GoogleNews-vectors-negative300.bin.gz
You can then use the below command to get wordVector.
from gensim import models w = models.KeyedVectors.load_word2vec_format( '../GoogleNews-vectors-negative300.bin', binary=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With