Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Character Level Embedding in Keras LSTM

I am a newbie in implementation of language models in Keras RNN structures. I have a dataset of discrete words (not from a single paragraph) that have the following statistics,

  1. Total word samples: 1953
  2. Total number of Distinct Characters: 33 (including START,END and *)
  3. Maximum length (number of characters) in a word is 10

Now, I want to build a model that will accept a character and predict the next character in the word. I have padded all the words so that they have same length. So my input is Word_input with shape 1953 x 9 and target is 1953 x 9 x 33. I also want to use Embedding layer. So my network architecture is,

    self.wordmodel=Sequential()
    self.wordmodel.add(Embedding(33,embedding_size,input_length=9))
    self.wordmodel.add(LSTM(128, return_sequences=True))
    self.wordmodel.add(TimeDistributed(Dense(33)))
    self.wordmodel.compile(loss='mse',optimizer='rmsprop',metrics=['accuracy'])

As an example a word "CAT" with padding represents

Input to Network -- START C A T END * * * * (9 Characters)

Target of the same --- C A T END * * * * *(9 Characters)

So with the TimeDistributed output I am measuring the difference of network prediction and target. I have also set the batch_size to 1. So that after reading every sample word the network reset its state.

My question is am I doing it conceptually right? Whenever I am running my training the accuracy is stuck about 56%.

Kindly enlighten me. Thanks.

like image 356
Parthosarathi Mukherjee Avatar asked Jun 16 '17 09:06

Parthosarathi Mukherjee


People also ask

What is character level embedding?

Character level embedding uses one-dimensional convolutional neural network (1D-CNN) to find numeric representation of words by looking at their character-level compositions. You can think of 1D-CNN as a process where we have several scanners sliding through a word, character by character.

What does embedding do in Lstm?

Embeddings make it easier to do machine learning on large inputs like sparse vectors representing words. Ideally, an embedding captures some of the semantics of the input by placing semantically similar inputs close together in the embedding space.

What might be an advantage of using character level embedding over word level embedding?

Having the character embedding, every single word's vector can be formed even it is out-of-vocabulary words (optional). On the other hand, word embedding can only handle those seen words.

How does the embedding layer in Keras work?

Keras provides an embedding layer that converts each word into a fixed-length vector of defined size. The one-hot-encoding technique generates a large sparse matrix to represent a single word, whereas, in embedding layers, every word has a real-valued vector of fixed length.


1 Answers

In my knowledge, the structure is basic and may work to some degree. I have some suggestions

  1. In the TimeDistributed layer, you should add an activation function softmax which is wide employed in multi-classification. And now in your structure, the output is non-limited and it's not intuitive as your target is just one-hot.

  2. With softmax function, you could change the loss to cross-entropy which increase the probability of correct class and decrease the others. It's more appropriate.

you can take a try. For more useful model, you could try following structure which is given in Pytorch tutorial. Thanks.

enter image description here

like image 150
danche Avatar answered Nov 14 '22 21:11

danche