ValueError: array is too big when loading GoogleNews-vectors-negative

Question

I am trying to load the pretrained word vectors from Google using the following code:

from gensim import models
w = models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz', binary=True)

But I am getting an error that tells me

File "C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py", line 197, in load_word2vec_format result.syn0 = zeros((vocab_size, vector_size), dtype=datatype)

ValueError: array is too big; arr.size * arr.dtype.itemsize is larger than the maximum possible size.

Could anyone suggest a possible solution. Thanks in advance.

gojomo · Accepted Answer

This is likely triggered because the Python you have installed uses 32-bit-addressing, and thus can't allocate arrays of the size required to load the GoogleNews vectors. Some options:

Switch to a 64-bit Python. Note that that full vector set takes 3GB+ to load, so unless you have more RAM than 4GB, it will be hard to work with the full set no matter what.
Use the optional limit parameter of gensim's load_word2vec_format() method to read only some of the early entries in the file. The file seems to be in most-frequent to least-frequent token order, so often the early entries are all you'll need. For example, you could try limit=500000 to read just the 1st 500,000 entries (instead of all 3 million)

ValueError: array is too big when loading GoogleNews-vectors-negative

Tags:

python

gensim

Winston

1 Answers

gojomo

Recent Activity

Donate For Us

ValueError: array is too big when loading GoogleNews-vectors-negative

Tags:

python

gensim

Winston

1 Answers

gojomo

Related questions

Recent Activity

Donate For Us