I am trying to implement the type of character level embeddings described in this paper in Keras. The character embeddings are calculated using a bidirectional LSTM.
To recreate this, I've first created a matrix of containing, for each word, the indexes of the characters making up the word:
char2ind = {char: index for index, char in enumerate(chars)}
max_word_len = max([len(word) for sentence in sentences for word in sentence])
X_char = []
for sentence in X:
for word in sentence:
word_chars = []
for character in word:
word_chars.append(char2ind[character])
X_char.append(word_chars)
X_char = sequence.pad_sequences(X_char, maxlen = max_word_len)
I then define a BiLSTM model with an embedding layer for the word-character matrix. I assume the input_dimension will have to be equal to the number of characters. I want a size of 64 for my character embeddings, so I set the hidden size of the BiLSTM to 32:
char_lstm = Sequential()
char_lstm.add(Embedding(len(char2ind) + 1, 64))
char_lstm.add(Bidirectional(LSTM(hidden_size, return_sequences=True)))
And this is where I get confused. How can I retrieve the embeddings from the model? I'm guessing I would have to compile the model and fit it then retrieve the weights to get the embeddings, but what parameters should I use to fit it ?
Additional details:
This is for an NER task, so the dataset technically could be be anything in the word - label format, although I am specifically working with the WikiGold ConLL corpus available here: https://github.com/pritishuplavikar/Resume-NER/blob/master/wikigold.conll.txt The expected output from the network are the labels (I-MISC, O, I-PER...)
I expect the dataset to be large enough to be training character embeddings directly from it. All words are coded with the index of their constituting characters, alphabet size is roughly 200 characters. The words are padded / cut to 20 characters. There are around 30 000 different words in the dataset.
I hope to be able learn embeddings for each characters based on the info from the different words. Then, as in the paper, I would concatenate the character embeddings with the word's glove embedding before feeding into a Bi-LSTM network with a final CRF layer.
I would also like to be able to save the embeddings so I can reuse them for other similar NLP tasks.
Generally speaking Keras approach to building models (even seemingly complex ones) is dead simple. For example, the kind of model you want to build would simply look like (note this is for binary classification problem):
model = Sequential()
model.add(Embedding(max_features, out_dims, input_length=maxlen))
model.add(Bidirectional(LSTM(32)))
model.add(Dropout(0.1))
model.add(Dense(1, activation='sigmoid'))
model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])
This is no different from plain vanilla NN with the exception of having the Embedding and Bidirectional layers in place Dense layers. This is one of the things that makes Keras amazing.
Usually it's helpful to look for a working example (Keras has loads) that is doing more or less the same thing you are trying to do. In this case you could first look at this model and then "reverse engineer" the workings of it to answer your questions. Usually things come down to formatting the data in the right way, where a working example model works wonders as you can carefully investigate the data format its using.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With