I have 3 Questions about fine-tuning word vectors. Please, help me out. I will really appreciate it! Many thanks in advance! <ol> <li>When I train my own CNN for text classification, I use Word2vec to initialize the words, then I just employ these pre-trained vectors as my input features to train CNN, so if I never had a embedding layer, it surely can not do any fine-tunes through back-propagation. my question is if I want to do fine-tuning, does it means to create a Embedding layer?and how to create it? </li> <li>When we train Word2vec, we use unsupervised training right? as in my case, I use the skip-gram model to get my pre-trained word2vec; But when I had the vec.bin and use it in the text classification model (CNN) as my words initialiser, if I could fine-tune the word-to-vector map in vec.bin, does it means that I have to have a CNN net structure exactly same as the one when training my Word2vec? and does the fine-tunes stuff would change the vec.bin or just fine-tune in computer memory?</li> <li>Are the skip-gram model and CBOW model are only used for unsupervised Word2vec training? Or they could also apply for other general text classification tasks? and what's the different of the network between Word2vec unsupervised training supervised fine-tuning? </li> </ol> @Franck Dernoncourt thank you for reminding me. I'm green here, and hope to learn something from the powerful community. Please have a look at my questions when you have time, thank you again!

1) What you need is just a good example of using pretrained word embedding with trainable/fixed embedding layer with following change in code. In Keras you can update this layer by default, to exclude it from training you need set trainable to False. <pre class="prettyprint"><code>embedding_layer = Embedding(nb_words + 1, EMBEDDING_DIM, weights=[embedding_matrix], input_length=MAX_SEQUENCE_LENGTH, trainable=True) </code></pre> 2) Your w2v is just for embedding layer initialization , no more relation to what CNN structure you are going to use. Will only update the weights in memory.

how to fine-tune word2vec when training our CNN for text classification?

Tags:

I have 3 Questions about fine-tuning word vectors. Please, help me out. I will really appreciate it! Many thanks in advance!

When I train my own CNN for text classification, I use Word2vec to initialize the words, then I just employ these pre-trained vectors as my input features to train CNN, so if I never had a embedding layer, it surely can not do any fine-tunes through back-propagation. my question is if I want to do fine-tuning, does it means to create a Embedding layer?and how to create it?
When we train Word2vec, we use unsupervised training right? as in my case, I use the skip-gram model to get my pre-trained word2vec; But when I had the vec.bin and use it in the text classification model (CNN) as my words initialiser, if I could fine-tune the word-to-vector map in vec.bin, does it means that I have to have a CNN net structure exactly same as the one when training my Word2vec? and does the fine-tunes stuff would change the vec.bin or just fine-tune in computer memory?
Are the skip-gram model and CBOW model are only used for unsupervised Word2vec training? Or they could also apply for other general text classification tasks? and what's the different of the network between Word2vec unsupervised training supervised fine-tuning?

@Franck Dernoncourt thank you for reminding me. I'm green here, and hope to learn something from the powerful community. Please have a look at my questions when you have time, thank you again!

307

asked Oct 20 '16 00:10

Prince of Persia

1 Answers

1) What you need is just a good example of using pretrained word embedding with trainable/fixed embedding layer with following change in code. In Keras you can update this layer by default, to exclude it from training you need set trainable to False.

embedding_layer = Embedding(nb_words + 1,
                            EMBEDDING_DIM,
                            weights=[embedding_matrix],
                            input_length=MAX_SEQUENCE_LENGTH,
                            trainable=True)

2) Your w2v is just for embedding layer initialization , no more relation to what CNN structure you are going to use. Will only update the weights in memory.

186

answered Sep 26 '22 16:09

Steven Du

Related questions
                            
                                Calculating miniscule numbers for chi-squared distribution -- numerical precision
                            
                                Deploy Symfony app with LexikJWTAuthenticationBundle on Heroku
                            
                                Handle Node.js spawnSync errors
                            
                                Code refactoring: from $ to [[
                            
                                docker compose: rebuild of one linked container breaks nginx's upstream
                            
                                Getting Java to work with Windows 10 Ubuntu
                            
                                Source location of a class
                            
                                How to implement a factory that automatically finds a strategy
                            
                                Type mismatch in for comprehension: getting "Product with Serializable"
                            
                                Why some commits are shown as committed "on GitHub"?
                            
                                How to debug C++ code on VSCode? MacOS
                            
                                @IBDesignable - view not rendering as expected

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With