Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improving on the basic, existing GloVe model

I am using GloVe as part of my research. I've downloaded the models from here. I've been using GloVe for sentence classification. The sentences I'm classifying are specific to a particular domain, say some STEM subject. However, since the existing GloVe models are trained on a general corpus, they may not yield the best results for my particular task.

So my question is, how would I go about loading the retrained model and just retraining it a little more on my own corpus to learn the semantics of my corpus as well? There would be merit in doing this were it possible.

like image 508
cs95 Avatar asked Apr 25 '17 18:04

cs95


2 Answers

After a little digging, I found this issue on the git repo. Someone suggested the following:

Yeah, this is not going to work well due to the optimization setup. But what you can do is train GloVe vectors on your own corpus and then concatenate those with the pretrained GloVe vectors for use in your end application.

So that answers that.

like image 121
cs95 Avatar answered Sep 28 '22 18:09

cs95


I believe GloVe (Global Vectors) is not meant to be appended, since it is based on the corpus' overall word co-occurrence statistics from a single corpus known only at initial training time

You can do is use gensim.scripts.glove2word2vec api to convert GloVe vectors into word2vec, but i dont think you can continue training since its loading in a KeyedVector not a Full Model

like image 42
StevenWernerCS Avatar answered Sep 28 '22 16:09

StevenWernerCS