Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store Word vector Embeddings?

I am using BERT Word Embeddings for sentence classification task with 3 labels. I am using Google Colab for coding. My problem is, since I will have to execute the embedding part every time I restart the kernel, is there any way to save these word embeddings once it is generated? Because, it takes a lot of time to generate those embeddings.

The code I am using to generate BERT Word Embeddings is -

[get_features(text_list[i]) for text_list[i] in text_list]

Here, gen_features is a function which returns word embedding for each i in my list text_list.

I read that converting embeddings into bumpy tensors and then using np.save can do it. But I actually don't know how to code it.

like image 944
PeakyBlinder Avatar asked Mar 03 '23 07:03

PeakyBlinder


1 Answers

You can save your embeddings data to a numpy file by following these steps:

all_embeddings = here_is_your_function_return_all_data()
all_embeddings = np.array(all_embeddings)
np.save('embeddings.npy', all_embeddings)

If you're saving into google colab, then you can download it to your local computer. Whenever you need it, just upload it and load it.

all_embeddings = np.load('embeddings.npy')

That's it.

Btw, You can also directly save your file to google drive.

like image 171
Nazmul Hasan Avatar answered Mar 12 '23 19:03

Nazmul Hasan