Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Text generation: character prediction RNN vs. word prediction RNN [closed]

I've been researching text generation with RNNs, and it seems as though the common technique is to input text character by character, and have the RNN predict the next character.

Why wouldn't you do the same technique but using words instead of characters. This seems like a much better technique to me because the RNN won't make any typos and it will be faster to train.

Am I missing something?

Furthermore, is it possible to create a word prediction RNN but with somehow inputting words pre-trained on word2vec, so that the RNN can understand their meaning?

like image 961
James Dorfman Avatar asked Mar 08 '23 04:03

James Dorfman


1 Answers

Why wouldn't you do the same technique but using words instead of characters.

Word-based models are used just as often as character-based ones. See an example in this question. But there several important differences between the two:

  • Character-based model is more flexible and can learn rarely used words and punctuation. And Andrej Karpathy's post shows how effective this model can be. But this is also a downside, because this model can produce complete nonsense sometimes.
  • Character-based models have much smaller vocabulary, which makes it easier and faster to train. Since one-hot encoding and softmax loss are working perfectly, there's no need to complicate the model with embedding vectors and specially crafted loss functions (negative sampling, NCE, ...)
  • Word-based models can't generate out-of-vocabulary (OOV) words, they are more complex and resource demanding. But they can learn syntactically and grammatically correct sentences and are more robust than character-based ones.

By the way, there are also subword models, which are somewhat in the middle. See "Subword language modeling with neural networks" by T. Mikolov at al.

Furthermore, is it possible to create a word prediction RNN but with somehow inputting words pretrained on word2vec, so that the RNN can understand their meaning?

Yes, the example I referred to above is exactly about this kind of model.

like image 50
Maxim Avatar answered Apr 07 '23 22:04

Maxim