Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding shapes of Keras layers

Tags:

I am going through this link to understand Multi-channel CNN Model for Text Classification.

The code is based on this tutorial.

I have understood most of the things, however I can't understand how Keras defines the output shapes of certain layers.

Here is the code:

define a model with three input channels for processing 4-grams, 6-grams, and 8-grams of movie review text.

#Skipped keras imports

# load a clean dataset
def load_dataset(filename):
    return load(open(filename, 'rb'))

# fit a tokenizer
def create_tokenizer(lines):
    tokenizer = Tokenizer()
    tokenizer.fit_on_texts(lines)
    return tokenizer

# calculate the maximum document length
def max_length(lines):
    return max([len(s.split()) for s in lines])

# encode a list of lines
def encode_text(tokenizer, lines, length):
    # integer encode
    encoded = tokenizer.texts_to_sequences(lines)
    # pad encoded sequences
    padded = pad_sequences(encoded, maxlen=length, padding='post')
    return padded

# define the model
def define_model(length, vocab_size):
    # channel 1
    inputs1 = Input(shape=(length,))
    embedding1 = Embedding(vocab_size, 100)(inputs1)
    conv1 = Conv1D(filters=32, kernel_size=4, activation='relu')(embedding1)
    drop1 = Dropout(0.5)(conv1)
    pool1 = MaxPooling1D(pool_size=2)(drop1)
    flat1 = Flatten()(pool1)
    # channel 2
    inputs2 = Input(shape=(length,))
    embedding2 = Embedding(vocab_size, 100)(inputs2)
    conv2 = Conv1D(filters=32, kernel_size=6, activation='relu')(embedding2)
    drop2 = Dropout(0.5)(conv2)
    pool2 = MaxPooling1D(pool_size=2)(drop2)
    flat2 = Flatten()(pool2)
    # channel 3
    inputs3 = Input(shape=(length,))
    embedding3 = Embedding(vocab_size, 100)(inputs3)
    conv3 = Conv1D(filters=32, kernel_size=8, activation='relu')(embedding3)
    drop3 = Dropout(0.5)(conv3)
    pool3 = MaxPooling1D(pool_size=2)(drop3)
    flat3 = Flatten()(pool3)
    # merge
    merged = concatenate([flat1, flat2, flat3])
    # interpretation
    dense1 = Dense(10, activation='relu')(merged)
    outputs = Dense(1, activation='sigmoid')(dense1)
    model = Model(inputs=[inputs1, inputs2, inputs3], outputs=outputs)
    # compile
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    # summarize
    print(model.summary())
    plot_model(model, show_shapes=True, to_file='multichannel.png')
    return model

# load training dataset
trainLines, trainLabels = load_dataset('train.pkl')
# create tokenizer
tokenizer = create_tokenizer(trainLines)
# calculate max document length
length = max_length(trainLines)
# calculate vocabulary size
vocab_size = len(tokenizer.word_index) + 1
print('Max document length: %d' % length)
print('Vocabulary size: %d' % vocab_size)
# encode data
trainX = encode_text(tokenizer, trainLines, length)
print(trainX.shape)

# define model
model = define_model(length, vocab_size)
# fit model
model.fit([trainX,trainX,trainX], array(trainLabels), epochs=10, batch_size=16)
# save the model
model.save('model.h5')

Running the code:

Running the example first prints a summary of the prepared training dataset. Max document length: 1380 Vocabulary size: 44277 (1800, 1380)

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to
====================================================================================================
input_1 (InputLayer)             (None, 1380)          0
____________________________________________________________________________________________________
input_2 (InputLayer)             (None, 1380)          0
____________________________________________________________________________________________________
input_3 (InputLayer)             (None, 1380)          0
____________________________________________________________________________________________________
embedding_1 (Embedding)          (None, 1380, 100)     4427700     input_1[0][0]
____________________________________________________________________________________________________
embedding_2 (Embedding)          (None, 1380, 100)     4427700     input_2[0][0]
____________________________________________________________________________________________________
embedding_3 (Embedding)          (None, 1380, 100)     4427700     input_3[0][0]
____________________________________________________________________________________________________
conv1d_1 (Conv1D)                (None, 1377, 32)      12832       embedding_1[0][0]
____________________________________________________________________________________________________
conv1d_2 (Conv1D)                (None, 1375, 32)      19232       embedding_2[0][0]
____________________________________________________________________________________________________
conv1d_3 (Conv1D)                (None, 1373, 32)      25632       embedding_3[0][0]
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 1377, 32)      0           conv1d_1[0][0]
____________________________________________________________________________________________________
dropout_2 (Dropout)              (None, 1375, 32)      0           conv1d_2[0][0]
____________________________________________________________________________________________________
dropout_3 (Dropout)              (None, 1373, 32)      0           conv1d_3[0][0]
____________________________________________________________________________________________________
max_pooling1d_1 (MaxPooling1D)   (None, 688, 32)       0           dropout_1[0][0]
____________________________________________________________________________________________________
max_pooling1d_2 (MaxPooling1D)   (None, 687, 32)       0           dropout_2[0][0]
____________________________________________________________________________________________________
max_pooling1d_3 (MaxPooling1D)   (None, 686, 32)       0           dropout_3[0][0]
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 22016)         0           max_pooling1d_1[0][0]
____________________________________________________________________________________________________
flatten_2 (Flatten)              (None, 21984)         0           max_pooling1d_2[0][0]
____________________________________________________________________________________________________
flatten_3 (Flatten)              (None, 21952)         0           max_pooling1d_3[0][0]
____________________________________________________________________________________________________
concatenate_1 (Concatenate)      (None, 65952)         0           flatten_1[0][0]
                                                                   flatten_2[0][0]
                                                                   flatten_3[0][0]
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 10)            659530      concatenate_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 1)             11          dense_1[0][0]
====================================================================================================
Total params: 14,000,337
Trainable params: 14,000,337
Non-trainable params: 0
____________________________________________________________________________________________________

And

Epoch 6/10
1800/1800 [==============================] - 30s - loss: 9.9093e-04 - acc: 1.0000
Epoch 7/10
1800/1800 [==============================] - 29s - loss: 5.1899e-04 - acc: 1.0000
Epoch 8/10
1800/1800 [==============================] - 28s - loss: 3.7958e-04 - acc: 1.0000
Epoch 9/10
1800/1800 [==============================] - 29s - loss: 3.0534e-04 - acc: 1.0000
Epoch 10/10
1800/1800 [==============================] - 29s - loss: 2.6234e-04 - acc: 1.0000

My interpretation of the Layer and output shape are as follows: Please help me understand if its correct as I am lost in multi-dimension.

input_1 (InputLayer) (None, 1380) : ---> 1380 is the total number of features ( that is 1380 input neurons) per data point. 1800 is the total number of documents or data points.

embedding_1 (Embedding) (None, 1380, 100) 4427700 ----> Embedding layer is : 1380 as features(words) and each feature is a vector of dimension 100.

How the number of parameters here is 4427700??

conv1d_1 (Conv1D) (None, 1377, 32) 12832 ------> Conv1d is of kernel size=4. Is it 1*4 filter which is used 32 times. Then how the dimension became (None, 1377, 32) with 12832 parameters?

max_pooling1d_1 (MaxPooling1D) (None, 688, 32) with MaxPooling1D(pool_size=2) how the dimension became (None, 688, 32)? flatten_1 (Flatten) (None, 22016) This is just multiplication of 688, 32?

** Does every epoch trains 1800 data points at once?**

Please let me know how output dimensions is calculated. Any reference or help would be appreciated.

like image 360
MAC Avatar asked Nov 13 '19 07:11

MAC


People also ask

How do I know the shape of my Keras model?

You can get the output shape with 'output. shape[1:]' command. It will get the shape of output layer and can be used for other purposes.

What do the different Keras layers do?

As learned earlier, Keras layers are the primary building block of Keras models. Each layer receives input information, do some computation and finally output the transformed information. The output of one layer will flow into the next layer as its input.

What is the difference between units input shape and output shape in Keras layer class?

Given the input shape, all other shapes are results of layers calculations. The "units" of each layer will define the output shape (the shape of the tensor that is produced by the layer and that will be the input of the next layer).


1 Answers

Please see the answers below:

input_1 (InputLayer) (None, 1380) : ---> 1380 is the total number of features ( that is 1380 input neurons) per data point. 1800 is the total number of documents or data points.

Yes. model.fit([trainX,trainX,trainX], array(trainLabels), epochs=10, batch_size=16) says, that you want the network to train 10 times (for 10 epochs) on the whole training dataset in batches of size 16.

This means, that every 16 data points, the backpropagation algorithm will be launched and the weights will update. This will happen 1800/16 times and will be called an epoch.

1380 is the number of neurons in the first layer.

embedding_1 (Embedding) (None, 1380, 100) | 4427700 ----> Embedding layer is : 1380 as features(words) and each feature is a vector of dimension 100.

1380 is the size of the input (numbers of neurons in the previous layer) and 100 is the size (length) of the embedding vector.

The number of parameters here is vocabulary_size * 100 as for each v in vocabulary you need to train 100 parameters. Embedding layer is in fact a matrix built with vocabulary_size vectors of size 100 where each row represents a vector representation of each word from the vocabulary.

conv1d_1 (Conv1D) (None, 1377, 32) | 12832 ------> Conv1d is of kernel size=4. Is it 1*4 filter which is used 32 times. Then how the dimension became (None, 1377, 32) with 12832 parameters?

1380 becomes 1377 because of the size of kernel. Imagine the following input (of size 10 to simplify) with kernel of size 4:

0123456789 #input
KKKK456789 
0KKKK56789 
12KKKK6789 
123KKKK789 
1234KKKK89 
12345KKKK9
123456KKKK

Look, the Kernel can't move any further to the right, so for the input size 10 and Kernel size 4, the output shape would be 7. In general, for input shape of n and kernel shape of k, the output shape would be n - k + 1, so for n=1380, k=4 the result is 1377.

The amount of the parameters is equal to 12832 because the number of parameters is equal to output_channels * (input_channels * window_size + 1). In your case it's 32*(100*4 + 1).

max_pooling1d_1 (MaxPooling1D) (None, 688, 32) with MaxPooling1D(pool_size=2) how the dimension became (None, 688, 32)?

The max_pooling takes every two consecutive numbers and replaces them with a max of them, so you end up with original_size/pool_size values.

flatten_1 (Flatten) (None, 22016) This is just multiplication of 688, 32?`

Yes, this is just a multiplication of 688 and 32. It's because, the flatten operation does the following:

1234
5678   -> 123456789012
9012

so it takes all values from all dimensions and put it into a one-dimensional vector.

Does every epoch trains 1800 data points at once?

No. It takes them in batches of 16 as pointed out in the first answer. Each epoch takes 1800 data points in a random order in batches of 16 data points. An epoch is a term which means, a period in time, after which we'll start reading data again.

Edit:

I will clarify the place where 1d convolutional layers are applied to embedding layers.

The output of the Embedding layers you should interpret as a vector of width 1380 and 100 channels.

Similarly to 2d images where you have an RGB image with three channels at the input, its shape is (width, height, 3) when you apply a convolutional layer built of 32 filters (filter size is irrelevant), the convolution operation is applied simultaneously to all channels and the output shape will be (new_width, new_height, 32). Notice the output shape is the same as the number of filters.

Back to your example. Treat the output shape from the embedding layer as (width, channels). So then the 1d convolutional layer with 32 filters and kernel size equals to 4 is applied to vector 1380 and depth 100. As result, you will get the output of shape (1377, 32).

like image 129
Piotr Grzybowski Avatar answered Jan 04 '23 16:01

Piotr Grzybowski