As context, I am relatively new to the world of machine learning and I am attempting a project with a goal of classifying plays in an NBA game. My inputs are a sequence of 40 frames from each play in an NBA game and my labels are 11 all-encompassing classifications for a given play.
The plan is to take each sequence of frames and pass each frame into a CNN to extract a set of features. Then each sequence of features from a given video would be passed onto an RNN.
I am currently using Keras for most of my implementation and I chose to use a VGG16 model for my CNN. Here is some of the relevant code below:
video = keras.Input(shape = (None, 255, 255, 3), name = 'video')
cnn = keras.applications.VGG16(include_top=False, weights = None, input_shape=
(255,255,3), pooling = 'avg', classes=11)
cnn.trainable = True
My question is - would it still be beneficial for me to initialize the weights of the VGG16 ConvNet to 'imagenet' if my goal is to classify video clips of NBA games? If so, why? If not, how can I train the VGG16 ConvNet to get my own set of weights and then how can I insert them into this function? I have had little luck finding any tutorials where someone included their own set of weights when using the VGG16 model.
I apologize if my questions seem naive but I would really appreciate any help in clearing this up.
Should you retrain VGG16 for your specific task? Absolutely not! Retraining such a huge network is hard, and requires lots of intuition and knowledge in training deep networks. Let's analyze why you can use the weights, pre-trained on ImageNet, for your task:
ImageNet is a huge dataset, containing of millions of images. VGG16 itself has been trained in 3-4 days or so on a powerful GPU. On CPU (assuming that you don't have a GPU as powerful as NVIDIA GeForce Titan X) would take weeks.
ImageNet contains images from real-world scenes. NBA games can also be considered as real-world scenes. So, it is very likely that pre-trained on ImageNet features can be used for NBA games, too.
Actually, you don't need to use all convolutional layers of pre-trained VGG16. Let's take a look at the visualization of internal VGG16 layers and see what they detect (taken from this article; the image is too large, so I put just a link for compactness):
So, you can decide which kind of features will be beneficial for your specific task. Do you need high level features at 5th block? Or you might want to use mid-level features of 3rd block? Maybe you want to stack another neural network on top of bottom layers of VGG? For more instruction, take a look at the following tutorial which I wrote; it was once on SO Documentation.
In this example, three brief and comprehensive sub-examples are presented:
Pre-trained on ImageNet models, including VGG-16 and VGG-19, are available in Keras. Here and after in this example, VGG-16 will be used. For more information, please visit Keras Applications documentation.
from keras import applications
# This will load the whole VGG16 network, including the top Dense layers.
# Note: by specifying the shape of top layers, input tensor shape is forced
# to be (224, 224, 3), therefore you can use it only on 224x224 images.
vgg_model = applications.VGG16(weights='imagenet', include_top=True)
# If you are only interested in convolution filters. Note that by not
# specifying the shape of top layers, the input tensor shape is (None, None, 3),
# so you can use them for any size of images.
vgg_model = applications.VGG16(weights='imagenet', include_top=False)
# If you want to specify input tensor
from keras.layers import Input
input_tensor = Input(shape=(160, 160, 3))
vgg_model = applications.VGG16(weights='imagenet',
include_top=False,
input_tensor=input_tensor)
# To see the models' architecture and layer names, run the following
vgg_model.summary()
Assume that for some specific task for images with the size (160, 160, 3)
, you want to use pre-trained bottom layers of VGG, up to layer with the name block2_pool
.
vgg_model = applications.VGG16(weights='imagenet',
include_top=False,
input_shape=(160, 160, 3))
# Creating dictionary that maps layer names to the layers
layer_dict = dict([(layer.name, layer) for layer in vgg_model.layers])
# Getting output tensor of the last VGG layer that we want to include
x = layer_dict['block2_pool'].output
# Stacking a new simple convolutional network on top of it
x = Conv2D(filters=64, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Flatten()(x)
x = Dense(256, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(10, activation='softmax')(x)
# Creating new model. Please note that this is NOT a Sequential() model.
from keras.models import Model
custom_model = Model(input=vgg_model.input, output=x)
# Make sure that the pre-trained bottom layers are not trainable
for layer in custom_model.layers[:7]:
layer.trainable = False
# Do not forget to compile it
custom_model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
Assume that you need to speed up VGG16 by replacing block1_conv1
and block2_conv2
with a single convolutional layer, in such a way that the pre-trained weights are saved.
The idea is to disassemble the whole network to separate layers, then assemble it back. Here is the code specifically for your task:
vgg_model = applications.VGG16(include_top=True, weights='imagenet')
# Disassemble layers
layers = [l for l in vgg_model.layers]
# Defining new convolutional layer.
# Important: the number of filters should be the same!
# Note: the receiptive field of two 3x3 convolutions is 5x5.
new_conv = Conv2D(filters=64,
kernel_size=(5, 5),
name='new_conv',
padding='same')(layers[0].output)
# Now stack everything back
# Note: If you are going to fine tune the model, do not forget to
# mark other layers as un-trainable
x = new_conv
for i in range(3, len(layers)):
layers[i].trainable = False
x = layers[i](x)
# Final touch
result_model = Model(input=layer[0].input, output=x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With