Import GoogleNews-vectors-negative300.bin

Tags:

I am working on code using the gensim and having a tough time troubleshooting a ValueError within my code. I finally was able to zip GoogleNews-vectors-negative300.bin.gz file so I could implement it in my model. I also tried gzip which the results were unsuccessful. The error in the code occurs in the last line. I would like to know what can be done to fix the error. Is there any workarounds? Finally, is there a website that I could reference?

Thank you respectfully for your assistance!

import gensim from keras import backend from keras.layers import Dense, Input, Lambda, LSTM, TimeDistributed from keras.layers.merge import concatenate from keras.layers.embeddings import Embedding from keras.models import Mode  pretrained_embeddings_path = "GoogleNews-vectors-negative300.bin" word2vec =  gensim.models.KeyedVectors.load_word2vec_format(pretrained_embeddings_path,  binary=True)  --------------------------------------------------------------------------- ValueError                                Traceback (most recent call last) <ipython-input-3-23bd96c1d6ab> in <module>()   1 pretrained_embeddings_path = "GoogleNews-vectors-negative300.bin" ----> 2 word2vec =  gensim.models.KeyedVectors.load_word2vec_format(pretrained_embeddings_path,  binary=True)  C:\Users\green\Anaconda3\envs\py35\lib\site- packages\gensim\models\keyedvectors.py in load_word2vec_format(cls, fname,  fvocab, binary, encoding, unicode_errors, limit, datatype) 244                             word.append(ch) 245                     word = utils.to_unicode(b''.join(word),  encoding=encoding, errors=unicode_errors) --> 246                     weights = fromstring(fin.read(binary_len),  dtype=REAL) 247                     add_word(word, weights) 248             else:  ValueError: string size must be a multiple of element size

498

asked Sep 26 '17 18:09

Green

1 Answers

The below commands work.

brew install wget  wget -c "https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz"

This downloads the GZIP compressed file that you can uncompress using:

gzip -d GoogleNews-vectors-negative300.bin.gz

You can then use the below command to get wordVector.

from gensim import models  w = models.KeyedVectors.load_word2vec_format(     '../GoogleNews-vectors-negative300.bin', binary=True)

answered Oct 12 '22 11:10

ohsoifelse

Related questions
                            
                                How to detect 386, amd64, arm, or arm64 OS architecture via shell/bash
                            
                                VSCode: delete all comments in a file
                            
                                Pass command line -- argument to child script in Yarn
                            
                                Allow statements before imports with Visual Studio Code and autopep8
                            
                                'FragmentStatePagerAdapter(androidx.fragment.app.FragmentManager)' is deprecated
                            
                                Angular CLI ng command not found on Mac Os
                            
                                ASP/VBScript - Int() vs CInt()
                            
                                How to implement a singleton in C#?
                            
                                Is there a function to round a float in C or do I need to write my own?
                            
                                Django workflow when modifying models frequently?
                            
                                Why do I need to know how many tests I will be running with Test::More?
                            
                                Reading a binary file and using Response.BinaryWrite()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With