I am trying to load the pretrained word vectors from Google using the following code:
from gensim import models
w = models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz', binary=True)
But I am getting an error that tells me
File "C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py", line 197, in load_word2vec_format result.syn0 = zeros((vocab_size, vector_size), dtype=datatype)
ValueError: array is too big;
arr.size * arr.dtype.itemsize
is larger than the maximum possible size.
Could anyone suggest a possible solution. Thanks in advance.
This is likely triggered because the Python you have installed uses 32-bit-addressing, and thus can't allocate arrays of the size required to load the GoogleNews vectors. Some options:
limit
parameter of gensim's load_word2vec_format()
method to read only some of the early entries in the file. The file seems to be in most-frequent to least-frequent token order, so often the early entries are all you'll need. For example, you could try limit=500000
to read just the 1st 500,000 entries (instead of all 3 million)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With