I am trying to load a pre-trained glove as a word2vec model in gensim. I have downloaded the glove file from here. I am using the following script:
from gensim import models
model = models.KeyedVectors.load_word2vec_format('glove.6B.300d.txt', binary=True)
but get the following error
ValueError Traceback (most recent call last)
<ipython-input-38-e0b48b51f433> in <module>()
1 from gensim import models
----> 2 model = models.KeyedVectors.load_word2vec_format('glove.6B.300d.txt', binary=True)
2 frames
/usr/local/lib/python3.6/dist-packages/gensim/models/utils_any2vec.py in <genexpr>(.0)
171 with utils.smart_open(fname) as fin:
172 header = utils.to_unicode(fin.readline(), encoding=encoding)
--> 173 vocab_size, vector_size = (int(x) for x in header.split()) # throws for invalid file format
174 if limit:
175 vocab_size = min(vocab_size, limit)
ValueError: invalid literal for int() with base 10: 'the'
What is the underlying problem? Does gensim need a specific format to be able to load it?
The GLoVe format is slightly different – missing a 1st-line declaration of vector-count & dimensions – than the format that load_word2vec_format() supports.
There's a glove2word2vec utility script included you can run once to convert the file:
https://radimrehurek.com/gensim/scripts/glove2word2vec.html
Also, starting in Gensim 4.0.0 (currentlyu in prerelease testing), the load_word2vec_format() method gets a new optional no_header parameter:
https://radimrehurek.com/gensim/models/keyedvectors.html?highlight=load_word2vec_format#gensim.models.keyedvectors.KeyedVectors.load_word2vec_format
If set as no_header=True, the method will deduce the count/dimensions from a preliminary scan of the file - so it can read a GLoVe file with that option – but at the cost of two full-file reads instead of one. (So, you may still want to re-save the object with .save_word2vec_format(), or use the glove2word2vec script, to make future loads faster.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With