Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to fine-tune word2vec when training our CNN for text classification?

Tags:

I have 3 Questions about fine-tuning word vectors. Please, help me out. I will really appreciate it! Many thanks in advance!

  1. When I train my own CNN for text classification, I use Word2vec to initialize the words, then I just employ these pre-trained vectors as my input features to train CNN, so if I never had a embedding layer, it surely can not do any fine-tunes through back-propagation. my question is if I want to do fine-tuning, does it means to create a Embedding layer?and how to create it?

  2. When we train Word2vec, we use unsupervised training right? as in my case, I use the skip-gram model to get my pre-trained word2vec; But when I had the vec.bin and use it in the text classification model (CNN) as my words initialiser, if I could fine-tune the word-to-vector map in vec.bin, does it means that I have to have a CNN net structure exactly same as the one when training my Word2vec? and does the fine-tunes stuff would change the vec.bin or just fine-tune in computer memory?

  3. Are the skip-gram model and CBOW model are only used for unsupervised Word2vec training? Or they could also apply for other general text classification tasks? and what's the different of the network between Word2vec unsupervised training supervised fine-tuning?

@Franck Dernoncourt thank you for reminding me. I'm green here, and hope to learn something from the powerful community. Please have a look at my questions when you have time, thank you again!

like image 307
Prince of Persia Avatar asked Oct 20 '16 00:10

Prince of Persia


People also ask

Which is better TF IDF or word2vec?

Some key differences between TF-IDF and word2vec is that TF-IDF is a statistical measure that we can apply to terms in a document and then use that to form a vector whereas word2vec will produce a vector for a term and then more work may need to be done to convert that set of vectors into a singular vector or other ...

What is the difference between transfer learning and fine-tuning?

Transfer learning is when a model developed for one task is reused to work on a second task. Fine-tuning is one approach to transfer learning where you change the model output to fit the new task and train only the output model. In Transfer Learning or Domain Adaptation, we train the model with a dataset.


1 Answers

1) What you need is just a good example of using pretrained word embedding with trainable/fixed embedding layer with following change in code. In Keras you can update this layer by default, to exclude it from training you need set trainable to False.

embedding_layer = Embedding(nb_words + 1,
                            EMBEDDING_DIM,
                            weights=[embedding_matrix],
                            input_length=MAX_SEQUENCE_LENGTH,
                            trainable=True)

2) Your w2v is just for embedding layer initialization , no more relation to what CNN structure you are going to use. Will only update the weights in memory.

like image 186
Steven Du Avatar answered Sep 26 '22 16:09

Steven Du