I am going through this link to understand Multi-channel CNN Model for Text Classification. The code is based on this tutorial. I have understood most of the things, however I can't understand how Keras defines the output shapes of certain layers. Here is the code: define a model with three input channels for processing 4-grams, 6-grams, and 8-grams of movie review text. <pre class="prettyprint"><code>#Skipped keras imports # load a clean dataset def load_dataset(filename): return load(open(filename, 'rb')) # fit a tokenizer def create_tokenizer(lines): tokenizer = Tokenizer() tokenizer.fit_on_texts(lines) return tokenizer # calculate the maximum document length def max_length(lines): return max([len(s.split()) for s in lines]) # encode a list of lines def encode_text(tokenizer, lines, length): # integer encode encoded = tokenizer.texts_to_sequences(lines) # pad encoded sequences padded = pad_sequences(encoded, maxlen=length, padding='post') return padded # define the model def define_model(length, vocab_size): # channel 1 inputs1 = Input(shape=(length,)) embedding1 = Embedding(vocab_size, 100)(inputs1) conv1 = Conv1D(filters=32, kernel_size=4, activation='relu')(embedding1) drop1 = Dropout(0.5)(conv1) pool1 = MaxPooling1D(pool_size=2)(drop1) flat1 = Flatten()(pool1) # channel 2 inputs2 = Input(shape=(length,)) embedding2 = Embedding(vocab_size, 100)(inputs2) conv2 = Conv1D(filters=32, kernel_size=6, activation='relu')(embedding2) drop2 = Dropout(0.5)(conv2) pool2 = MaxPooling1D(pool_size=2)(drop2) flat2 = Flatten()(pool2) # channel 3 inputs3 = Input(shape=(length,)) embedding3 = Embedding(vocab_size, 100)(inputs3) conv3 = Conv1D(filters=32, kernel_size=8, activation='relu')(embedding3) drop3 = Dropout(0.5)(conv3) pool3 = MaxPooling1D(pool_size=2)(drop3) flat3 = Flatten()(pool3) # merge merged = concatenate([flat1, flat2, flat3]) # interpretation dense1 = Dense(10, activation='relu')(merged) outputs = Dense(1, activation='sigmoid')(dense1) model = Model(inputs=[inputs1, inputs2, inputs3], outputs=outputs) # compile model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # summarize print(model.summary()) plot_model(model, show_shapes=True, to_file='multichannel.png') return model # load training dataset trainLines, trainLabels = load_dataset('train.pkl') # create tokenizer tokenizer = create_tokenizer(trainLines) # calculate max document length length = max_length(trainLines) # calculate vocabulary size vocab_size = len(tokenizer.word_index) + 1 print('Max document length: %d' % length) print('Vocabulary size: %d' % vocab_size) # encode data trainX = encode_text(tokenizer, trainLines, length) print(trainX.shape) # define model model = define_model(length, vocab_size) # fit model model.fit([trainX,trainX,trainX], array(trainLabels), epochs=10, batch_size=16) # save the model model.save('model.h5') </code></pre> Running the code: Running the example first prints a summary of the prepared training dataset. Max document length: 1380 Vocabulary size: 44277 (1800, 1380) <pre class="prettyprint"><code>____________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ==================================================================================================== input_1 (InputLayer) (None, 1380) 0 ____________________________________________________________________________________________________ input_2 (InputLayer) (None, 1380) 0 ____________________________________________________________________________________________________ input_3 (InputLayer) (None, 1380) 0 ____________________________________________________________________________________________________ embedding_1 (Embedding) (None, 1380, 100) 4427700 input_1[0][0] ____________________________________________________________________________________________________ embedding_2 (Embedding) (None, 1380, 100) 4427700 input_2[0][0] ____________________________________________________________________________________________________ embedding_3 (Embedding) (None, 1380, 100) 4427700 input_3[0][0] ____________________________________________________________________________________________________ conv1d_1 (Conv1D) (None, 1377, 32) 12832 embedding_1[0][0] ____________________________________________________________________________________________________ conv1d_2 (Conv1D) (None, 1375, 32) 19232 embedding_2[0][0] ____________________________________________________________________________________________________ conv1d_3 (Conv1D) (None, 1373, 32) 25632 embedding_3[0][0] ____________________________________________________________________________________________________ dropout_1 (Dropout) (None, 1377, 32) 0 conv1d_1[0][0] ____________________________________________________________________________________________________ dropout_2 (Dropout) (None, 1375, 32) 0 conv1d_2[0][0] ____________________________________________________________________________________________________ dropout_3 (Dropout) (None, 1373, 32) 0 conv1d_3[0][0] ____________________________________________________________________________________________________ max_pooling1d_1 (MaxPooling1D) (None, 688, 32) 0 dropout_1[0][0] ____________________________________________________________________________________________________ max_pooling1d_2 (MaxPooling1D) (None, 687, 32) 0 dropout_2[0][0] ____________________________________________________________________________________________________ max_pooling1d_3 (MaxPooling1D) (None, 686, 32) 0 dropout_3[0][0] ____________________________________________________________________________________________________ flatten_1 (Flatten) (None, 22016) 0 max_pooling1d_1[0][0] ____________________________________________________________________________________________________ flatten_2 (Flatten) (None, 21984) 0 max_pooling1d_2[0][0] ____________________________________________________________________________________________________ flatten_3 (Flatten) (None, 21952) 0 max_pooling1d_3[0][0] ____________________________________________________________________________________________________ concatenate_1 (Concatenate) (None, 65952) 0 flatten_1[0][0] flatten_2[0][0] flatten_3[0][0] ____________________________________________________________________________________________________ dense_1 (Dense) (None, 10) 659530 concatenate_1[0][0] ____________________________________________________________________________________________________ dense_2 (Dense) (None, 1) 11 dense_1[0][0] ==================================================================================================== Total params: 14,000,337 Trainable params: 14,000,337 Non-trainable params: 0 ____________________________________________________________________________________________________ </code></pre> And <pre class="prettyprint"><code>Epoch 6/10 1800/1800 [==============================] - 30s - loss: 9.9093e-04 - acc: 1.0000 Epoch 7/10 1800/1800 [==============================] - 29s - loss: 5.1899e-04 - acc: 1.0000 Epoch 8/10 1800/1800 [==============================] - 28s - loss: 3.7958e-04 - acc: 1.0000 Epoch 9/10 1800/1800 [==============================] - 29s - loss: 3.0534e-04 - acc: 1.0000 Epoch 10/10 1800/1800 [==============================] - 29s - loss: 2.6234e-04 - acc: 1.0000 </code></pre> My interpretation of the Layer and output shape are as follows: Please help me understand if its correct as I am lost in multi-dimension. input_1 (InputLayer) (None, 1380) : ---> <code>1380</code> is the total number of features ( that is 1380 input neurons) per data point. <code>1800</code> is the total number of documents or data points. embedding_1 (Embedding) (None, 1380, 100) 4427700 ----> Embedding layer is : 1380 as features(words) and each feature is a vector of dimension 100. How the number of parameters here is <code>4427700</code>?? conv1d_1 (Conv1D) (None, 1377, 32) 12832 ------> Conv1d is of <code>kernel size=4</code>. Is it <code>1*4</code> filter which is used <code>32</code> times. Then how the dimension became <code>(None, 1377, 32)</code> with <code>12832</code> parameters? max_pooling1d_1 (MaxPooling1D) (None, 688, 32) with MaxPooling1D(pool_size=2) how the dimension became <code>(None, 688, 32)</code>? flatten_1 (Flatten) (None, 22016) This is just multiplication of 688, 32? ** Does every epoch trains 1800 data points at once?** Please let me know how output dimensions is calculated. Any reference or help would be appreciated.

Please see the answers below: <blockquote> <code>input_1 (InputLayer) (None, 1380)</code> : ---> 1380 is the total number of features ( that is 1380 input neurons) per data point. 1800 is the total number of documents or data points. </blockquote> Yes. <code>model.fit([trainX,trainX,trainX], array(trainLabels), epochs=10, batch_size=16)</code> says, that you want the network to train 10 times (for 10 epochs) on the whole training dataset in batches of size 16. This means, that every 16 data points, the backpropagation algorithm will be launched and the weights will update. This will happen <code>1800/16</code> times and will be called an epoch. <code>1380</code> is the number of neurons in the first layer. <blockquote> <code>embedding_1 (Embedding) (None, 1380, 100) | 4427700</code> ----> Embedding layer is : 1380 as features(words) and each feature is a vector of dimension 100. </blockquote> 1380 is the size of the input (numbers of neurons in the previous layer) and 100 is the size (length) of the embedding vector. The number of parameters here is <code>vocabulary_size * 100</code> as for each <code>v in vocabulary</code> you need to train 100 parameters. Embedding layer is in fact a matrix built with vocabulary_size vectors of size 100 where each row represents a vector representation of each word from the vocabulary. <blockquote> <code>conv1d_1 (Conv1D) (None, 1377, 32) | 12832</code> ------> Conv1d is of kernel size=4. Is it 1*4 filter which is used 32 times. Then how the dimension became (None, 1377, 32) with 12832 parameters? </blockquote> 1380 becomes 1377 because of the size of kernel. Imagine the following input (of size 10 to simplify) with kernel of size 4: <pre class="prettyprint"><code>0123456789 #input KKKK456789 0KKKK56789 12KKKK6789 123KKKK789 1234KKKK89 12345KKKK9 123456KKKK </code></pre> Look, the Kernel can't move any further to the right, so for the input size 10 and Kernel size 4, the output shape would be 7. In general, for input shape of n and kernel shape of k, the output shape would be <code>n - k + 1</code>, so for <code>n=1380, k=4</code> the result is <code>1377</code>. The amount of the parameters is equal to 12832 because the number of parameters is equal to <code>output_channels * (input_channels * window_size + 1)</code>. In your case it's <code>32*(100*4 + 1)</code>. <blockquote> <code>max_pooling1d_1 (MaxPooling1D) (None, 688, 32) with MaxPooling1D(pool_size=2)</code> how the dimension became <code>(None, 688, 32)</code>? </blockquote> The <code>max_pooling</code> takes every two consecutive numbers and replaces them with a max of them, so you end up with <code>original_size/pool_size</code> values. <blockquote> <code>flatten_1 (Flatten) (None, 22016)</code> This is just multiplication of 688, 32?` </blockquote> Yes, this is just a multiplication of 688 and 32. It's because, the flatten operation does the following: <pre class="prettyprint"><code>1234 5678 -> 123456789012 9012 </code></pre> so it takes all values from all dimensions and put it into a one-dimensional vector. <blockquote> Does every epoch trains 1800 data points at once? </blockquote> No. It takes them in batches of 16 as pointed out in the first answer. Each epoch takes 1800 data points in a random order in batches of 16 data points. An epoch is a term which means, a period in time, after which we'll start reading data again. Edit: I will clarify the place where 1d convolutional layers are applied to embedding layers. The output of the Embedding layers you should interpret as a vector of width 1380 and 100 channels. Similarly to 2d images where you have an RGB image with three channels at the input, its shape is (width, height, 3) when you apply a convolutional layer built of 32 filters (filter size is irrelevant), the convolution operation is applied simultaneously to all channels and the output shape will be (new_width, new_height, 32). Notice the output shape is the same as the number of filters. Back to your example. Treat the output shape from the embedding layer as (width, channels). So then the 1d convolutional layer with 32 filters and kernel size equals to 4 is applied to vector 1380 and depth 100. As result, you will get the output of shape (1377, 32).

Understanding shapes of Keras layers

Tags:

I am going through this link to understand Multi-channel CNN Model for Text Classification.

The code is based on this tutorial.

I have understood most of the things, however I can't understand how Keras defines the output shapes of certain layers.

Here is the code:

define a model with three input channels for processing 4-grams, 6-grams, and 8-grams of movie review text.

#Skipped keras imports

# load a clean dataset
def load_dataset(filename):
    return load(open(filename, 'rb'))

# fit a tokenizer
def create_tokenizer(lines):
    tokenizer = Tokenizer()
    tokenizer.fit_on_texts(lines)
    return tokenizer

# calculate the maximum document length
def max_length(lines):
    return max([len(s.split()) for s in lines])

# encode a list of lines
def encode_text(tokenizer, lines, length):
    # integer encode
    encoded = tokenizer.texts_to_sequences(lines)
    # pad encoded sequences
    padded = pad_sequences(encoded, maxlen=length, padding='post')
    return padded

# define the model
def define_model(length, vocab_size):
    # channel 1
    inputs1 = Input(shape=(length,))
    embedding1 = Embedding(vocab_size, 100)(inputs1)
    conv1 = Conv1D(filters=32, kernel_size=4, activation='relu')(embedding1)
    drop1 = Dropout(0.5)(conv1)
    pool1 = MaxPooling1D(pool_size=2)(drop1)
    flat1 = Flatten()(pool1)
    # channel 2
    inputs2 = Input(shape=(length,))
    embedding2 = Embedding(vocab_size, 100)(inputs2)
    conv2 = Conv1D(filters=32, kernel_size=6, activation='relu')(embedding2)
    drop2 = Dropout(0.5)(conv2)
    pool2 = MaxPooling1D(pool_size=2)(drop2)
    flat2 = Flatten()(pool2)
    # channel 3
    inputs3 = Input(shape=(length,))
    embedding3 = Embedding(vocab_size, 100)(inputs3)
    conv3 = Conv1D(filters=32, kernel_size=8, activation='relu')(embedding3)
    drop3 = Dropout(0.5)(conv3)
    pool3 = MaxPooling1D(pool_size=2)(drop3)
    flat3 = Flatten()(pool3)
    # merge
    merged = concatenate([flat1, flat2, flat3])
    # interpretation
    dense1 = Dense(10, activation='relu')(merged)
    outputs = Dense(1, activation='sigmoid')(dense1)
    model = Model(inputs=[inputs1, inputs2, inputs3], outputs=outputs)
    # compile
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    # summarize
    print(model.summary())
    plot_model(model, show_shapes=True, to_file='multichannel.png')
    return model

# load training dataset
trainLines, trainLabels = load_dataset('train.pkl')
# create tokenizer
tokenizer = create_tokenizer(trainLines)
# calculate max document length
length = max_length(trainLines)
# calculate vocabulary size
vocab_size = len(tokenizer.word_index) + 1
print('Max document length: %d' % length)
print('Vocabulary size: %d' % vocab_size)
# encode data
trainX = encode_text(tokenizer, trainLines, length)
print(trainX.shape)

# define model
model = define_model(length, vocab_size)
# fit model
model.fit([trainX,trainX,trainX], array(trainLabels), epochs=10, batch_size=16)
# save the model
model.save('model.h5')

Running the code:

Running the example first prints a summary of the prepared training dataset. Max document length: 1380 Vocabulary size: 44277 (1800, 1380)

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to
====================================================================================================
input_1 (InputLayer)             (None, 1380)          0
____________________________________________________________________________________________________
input_2 (InputLayer)             (None, 1380)          0
____________________________________________________________________________________________________
input_3 (InputLayer)             (None, 1380)          0
____________________________________________________________________________________________________
embedding_1 (Embedding)          (None, 1380, 100)     4427700     input_1[0][0]
____________________________________________________________________________________________________
embedding_2 (Embedding)          (None, 1380, 100)     4427700     input_2[0][0]
____________________________________________________________________________________________________
embedding_3 (Embedding)          (None, 1380, 100)     4427700     input_3[0][0]
____________________________________________________________________________________________________
conv1d_1 (Conv1D)                (None, 1377, 32)      12832       embedding_1[0][0]
____________________________________________________________________________________________________
conv1d_2 (Conv1D)                (None, 1375, 32)      19232       embedding_2[0][0]
____________________________________________________________________________________________________
conv1d_3 (Conv1D)                (None, 1373, 32)      25632       embedding_3[0][0]
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 1377, 32)      0           conv1d_1[0][0]
____________________________________________________________________________________________________
dropout_2 (Dropout)              (None, 1375, 32)      0           conv1d_2[0][0]
____________________________________________________________________________________________________
dropout_3 (Dropout)              (None, 1373, 32)      0           conv1d_3[0][0]
____________________________________________________________________________________________________
max_pooling1d_1 (MaxPooling1D)   (None, 688, 32)       0           dropout_1[0][0]
____________________________________________________________________________________________________
max_pooling1d_2 (MaxPooling1D)   (None, 687, 32)       0           dropout_2[0][0]
____________________________________________________________________________________________________
max_pooling1d_3 (MaxPooling1D)   (None, 686, 32)       0           dropout_3[0][0]
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 22016)         0           max_pooling1d_1[0][0]
____________________________________________________________________________________________________
flatten_2 (Flatten)              (None, 21984)         0           max_pooling1d_2[0][0]
____________________________________________________________________________________________________
flatten_3 (Flatten)              (None, 21952)         0           max_pooling1d_3[0][0]
____________________________________________________________________________________________________
concatenate_1 (Concatenate)      (None, 65952)         0           flatten_1[0][0]
                                                                   flatten_2[0][0]
                                                                   flatten_3[0][0]
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 10)            659530      concatenate_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 1)             11          dense_1[0][0]
====================================================================================================
Total params: 14,000,337
Trainable params: 14,000,337
Non-trainable params: 0
____________________________________________________________________________________________________

And

Epoch 6/10
1800/1800 [==============================] - 30s - loss: 9.9093e-04 - acc: 1.0000
Epoch 7/10
1800/1800 [==============================] - 29s - loss: 5.1899e-04 - acc: 1.0000
Epoch 8/10
1800/1800 [==============================] - 28s - loss: 3.7958e-04 - acc: 1.0000
Epoch 9/10
1800/1800 [==============================] - 29s - loss: 3.0534e-04 - acc: 1.0000
Epoch 10/10
1800/1800 [==============================] - 29s - loss: 2.6234e-04 - acc: 1.0000

My interpretation of the Layer and output shape are as follows: Please help me understand if its correct as I am lost in multi-dimension.

input_1 (InputLayer) (None, 1380) : ---> 1380 is the total number of features ( that is 1380 input neurons) per data point. 1800 is the total number of documents or data points.

embedding_1 (Embedding) (None, 1380, 100) 4427700 ----> Embedding layer is : 1380 as features(words) and each feature is a vector of dimension 100.

How the number of parameters here is 4427700??

conv1d_1 (Conv1D) (None, 1377, 32) 12832 ------> Conv1d is of kernel size=4. Is it 1*4 filter which is used 32 times. Then how the dimension became (None, 1377, 32) with 12832 parameters?

max_pooling1d_1 (MaxPooling1D) (None, 688, 32) with MaxPooling1D(pool_size=2) how the dimension became (None, 688, 32)? flatten_1 (Flatten) (None, 22016) This is just multiplication of 688, 32?

** Does every epoch trains 1800 data points at once?**

Please let me know how output dimensions is calculated. Any reference or help would be appreciated.

360

asked Nov 13 '19 07:11

MAC

1 Answers

Please see the answers below:

input_1 (InputLayer) (None, 1380) : ---> 1380 is the total number of features ( that is 1380 input neurons) per data point. 1800 is the total number of documents or data points.

Yes. model.fit([trainX,trainX,trainX], array(trainLabels), epochs=10, batch_size=16) says, that you want the network to train 10 times (for 10 epochs) on the whole training dataset in batches of size 16.

This means, that every 16 data points, the backpropagation algorithm will be launched and the weights will update. This will happen 1800/16 times and will be called an epoch.

1380 is the number of neurons in the first layer.

embedding_1 (Embedding) (None, 1380, 100) | 4427700 ----> Embedding layer is : 1380 as features(words) and each feature is a vector of dimension 100.

1380 is the size of the input (numbers of neurons in the previous layer) and 100 is the size (length) of the embedding vector.

The number of parameters here is vocabulary_size * 100 as for each v in vocabulary you need to train 100 parameters. Embedding layer is in fact a matrix built with vocabulary_size vectors of size 100 where each row represents a vector representation of each word from the vocabulary.

conv1d_1 (Conv1D) (None, 1377, 32) | 12832 ------> Conv1d is of kernel size=4. Is it 1*4 filter which is used 32 times. Then how the dimension became (None, 1377, 32) with 12832 parameters?

1380 becomes 1377 because of the size of kernel. Imagine the following input (of size 10 to simplify) with kernel of size 4:

0123456789 #input
KKKK456789 
0KKKK56789 
12KKKK6789 
123KKKK789 
1234KKKK89 
12345KKKK9
123456KKKK

Look, the Kernel can't move any further to the right, so for the input size 10 and Kernel size 4, the output shape would be 7. In general, for input shape of n and kernel shape of k, the output shape would be n - k + 1, so for n=1380, k=4 the result is 1377.

The amount of the parameters is equal to 12832 because the number of parameters is equal to output_channels * (input_channels * window_size + 1). In your case it's 32*(100*4 + 1).

max_pooling1d_1 (MaxPooling1D) (None, 688, 32) with MaxPooling1D(pool_size=2) how the dimension became (None, 688, 32)?

The max_pooling takes every two consecutive numbers and replaces them with a max of them, so you end up with original_size/pool_size values.

flatten_1 (Flatten) (None, 22016) This is just multiplication of 688, 32?`

Yes, this is just a multiplication of 688 and 32. It's because, the flatten operation does the following:

1234
5678   -> 123456789012
9012

so it takes all values from all dimensions and put it into a one-dimensional vector.

Does every epoch trains 1800 data points at once?

No. It takes them in batches of 16 as pointed out in the first answer. Each epoch takes 1800 data points in a random order in batches of 16 data points. An epoch is a term which means, a period in time, after which we'll start reading data again.

Edit:

I will clarify the place where 1d convolutional layers are applied to embedding layers.

The output of the Embedding layers you should interpret as a vector of width 1380 and 100 channels.

Similarly to 2d images where you have an RGB image with three channels at the input, its shape is (width, height, 3) when you apply a convolutional layer built of 32 filters (filter size is irrelevant), the convolution operation is applied simultaneously to all channels and the output shape will be (new_width, new_height, 32). Notice the output shape is the same as the number of filters.

Back to your example. Treat the output shape from the embedding layer as (width, channels). So then the 1d convolutional layer with 32 filters and kernel size equals to 4 is applied to vector 1380 and depth 100. As result, you will get the output of shape (1377, 32).

129

answered Jan 04 '23 16:01

Piotr Grzybowski

Related questions
                            
                                How does an api compare to directly querying your database
                            
                                Conditional importing/overriding module imports
                            
                                Input text overlapping in TextField multiline area of Material-UI
                            
                                Why do I get error JwtStrategy requires a secret or key?
                            
                                plpgsql: No function matches the given name and argument types. You might need to add explicit type casts
                            
                                RXJS Observable remove pipe operators
                            
                                How to align text center in a responsive css button?
                            
                                How to orient GLTF in Android Sceneform level with gravity
                            
                                How does bazel choose the python version?
                            
                                How to handle concurrent DbContext access in dataloaders / GraphQL nested queries?
                            
                                Can't access User Secrets in ASP.net core 3.0
                            
                                EF Core 3 Linq could not be translated

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding shapes of Keras layers

Tags:

MAC

People also ask

1 Answers

Piotr Grzybowski

Recent Activity

Donate For Us