Merging pretrained models in Word2Vec?

Tags:

I have download 100 billion word Google news pretrained vector file. On top of that i am also training my own 3gb data producing another pretrained vector file. Both has 300 feature dimensions and more than 1gb size.

How do i merge these two huge pre-trained vectors? or how do i train a new model and update vectors on top of another? I see that C based word2vec does not support batch training.

I am looking to compute word analogy from these two models. I believe that vectors learned from these two sources will produce pretty good results.

888

asked May 27 '15 12:05

pbu

Video Answer

1 Answers

There's no straightforward way to merge the end-results of separate training sessions.

Even for the exact same data, slight randomization from initial seeding or thread scheduling jitter will result in diverse end states, making vectors only fully comparable within the same session.

This is because every session finds a useful configuration of vectors... but there are many equally useful configurations, rather than a single best.

For example, whatever final state you reach has many rotations/reflections that can be exactly as good on the training prediction task, or perform exactly as well on some other task (like analogies-solving). But most of these possible alternatives will not have coordinates that can be mixed-and-matched for useful comparisons against each other.

Preloading your model with data from prior training runs might improve the results after more training with new data, but I'm not aware of any rigorous testing of this possibility. The effect likely depends on your specific goals, your parameter choices, and how much the new and old data are similar, or representative of the eventual data against which the vectors will be used.

For example, if the Google News corpus is unlike your own training data, or the text you'll be using the word-vectors to understand, using it as a starting point might just slow or bias your training. On the other hand, if you train on your new data long enough, eventually any influence of the preloaded values could be diluted to nothingness. (If you really wanted a 'blended' result, you might have to simultaneously train on the new data with an interleaved goal for nudging the vectors back towards the prior-dataset values.)

Ways to combine the results from independent sessions might make a good research project. Maybe the method used in the word2vec language-translation projects – learning a projection between vocabulary spaces – could also 'translate' between the different coordinates of different runs. Maybe locking some vectors in place, or training on the dual goals of 'predict the new text' and 'stay close to the old vectors' would give meaningfully improved combined results.

answered Sep 23 '22 07:09

gojomo

Related questions
                            
                                How to predict from saved model in Keras ?
                            
                                Port XGBoost model trained in python to another system written in C/C++
                            
                                How to go about searching for a player models in COD with OpenCV
                            
                                Gradient descent algorithm won't converge
                            
                                Identifying verb tenses in python
                            
                                what is f-measure for each class in weka
                            
                                Assertion failed (queryDescriptors.type() == trainDescCollection[0].type()) in knnMatchImpl,
                            
                                Real world examples of Machine Learning? [closed]
                            
                                How can I handle new users/items in model generated by Spark ALS from MLlib?
                            
                                How to optimize MAPE code in Python?
                            
                                scheduled sampling in Tensorflow
                            
                                Number of parameters for Keras SimpleRNN
                            
                                Scoring in Gridsearch CV
                            
                                Find input that maximises output of a neural network using Keras and TensorFlow
                            
                                Python - Pandas, Resample dataset to have balanced classes
                            
                                Understanding Tensorflow control dependencies
                            
                                Measuring the performance of classification algorithm
                            
                                Example for svm feature selection in R
                            
                                How to reuse saved classifier created from explorer(in weka) in eclipse java
                            
                                What is the difference between a decision boundary and a hyperplane?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Merging pretrained models in Word2Vec?

Tags:

machine-learning

word2vec

pbu

People also ask

Video Answer

1 Answers

gojomo

Recent Activity

Donate For Us