I have downloaded the data with wget
!wget http://nlp.stanford.edu/data/glove.6B.zip
- ‘glove.6B.zip’ saved [862182613/862182613]
It is saved as zip and I would like to use glove.6B.300d.txt file from the zip file. What I want to achieve is :
embeddings_index = {}
with io.open('glove.6B.300d.txt', encoding='utf8') as f:
for line in f:
values = line.split()
word = values[0]
coefs = np.asarray(values[1:],dtype='float32')
embeddings_index[word] = coefs
Of course I am having this error:
IOErrorTraceback (most recent call last)
<ipython-input-47-d07cafc85c1c> in <module>()
1 embeddings_index = {}
----> 2 with io.open('glove.6B.300d.txt', encoding='utf8') as f:
3 for line in f:
4 values = line.split()
5 word = values[0]
IOError: [Errno 2] No such file or directory: 'glove.6B.300d.txt'
How can I unzip and use that file in my code above on Google colab?
One more way you could do is as follows.
!wget http://nlp.stanford.edu/data/glove.6B.zip
post downloading the zip file it is saved in the /content directory of google Collab.
!unzip glove*.zip
!ls
!pwd
print('Indexing word vectors.')
embeddings_index = {}
f = open('glove.6B.100d.txt', encoding='utf-8')
for line in f:
values = line.split()
word = values[0]
coefs = np.asarray(values[1:], dtype='float32')
embeddings_index[word] = coefs
f.close()
print('Found %s word vectors.' % len(embeddings_index))
!pip install --upgrade pip
!pip install -U -q pydrive
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
!mkdir -p drive
!google-drive-ocamlfuse drive
import pickle
pickle.dump({'embeddings_index' : embeddings_index } , open('drive/path/to/your/file/location', 'wb'))
If you have already downloaded the zip file in the local system, just extract it and upload the required dimension file to google drive -> fuse gdrive -> give the appropriate path and then use it / make an index of it, etc.
also, another way would be if already downloaded in the local system via code in collab
from google.colab import files
files.upload()
select the file and use it as in step 3 onwards.
This is how you can work with glove word embedding in google collaboratory. hope it helps.
Its simple, checkout this older post from SO.
import zipfile
zip_ref = zipfile.ZipFile(path_to_zip_file, 'r')
zip_ref.extractall(directory_to_extract_to)
zip_ref.close()
If you have Google Drive, you can:
Mount your Google Drive so that it can be used from Colab notebook
from google.colab import drive
drive.mount('/content/gdrive')
Download glove.6B.zip and extract it to a place of your choice on your Google Drive, for example
"My Drive/Place/Of/Your/Choice/glove.6B.300d.txt"
Open the file directly from your Colab notebook
with io.open('/content/gdrive/Place/Of/Your/Choice/glove.6B.300d.txt', encoding='utf8') as f:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With