I have not coded in years, forgive me. I am trying to do something that may be impossible. I have 38 videos of people performing the same basic movement. I want to train the model to identify those doing it correct v not correct. I am using color now, because the grayscale did not work either and I wanted to test like the example I used. I used the model as defined in an example, link.
Keras, Python3.5 in Anaconda 64, Tensorflow backend, on Windows 10 (64bit)
I was hoping to try different models on the problem and use grayscale to reduce memory, but cant get past first step!
Thanks!!!
Here is my code:
import time
import numpy as np
import sys
import os
import cv2
import keras
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, BatchNormalization
from keras.layers import Conv3D, Conv2D, MaxPooling2D, GRU, ConvLSTM2D, TimeDistributed
y_cat = np.zeros(40,np.float)
good = "Good"
bad = "Bad"
batch_size = 32
num_classes = 1
epochs = 1
nvideos = 38
nframes = 130
nrows = 240
ncols = 320
nchan = 3
x_learn = np.zeros((nvideos,nframes,nrows,ncols,nchan),np.int32)
x_learn = np.load(".\\train\\datasetcolor.npy")
with open(".\\train\\tags.txt") as ft:
    y_learn = ft.readlines()
y_learn = [x.strip() for x in y_learn] 
ft.close()
# transform string tags to numeric.
for i in range (0,len(y_learn)):
    if (y_learn[i] == good): y_cat[i] = 1
    elif (y_learn[i]  == bad): y_cat[i] = 0
#build model 
# duplicating from https://github.com/fchollet/keras/blob/master/examples/conv_lstm.py
model = Sequential()
model.image_dim_ordering = 'tf'
model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
                   input_shape=(nframes,nrows,ncols,nchan),
                   padding='same', return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
                   padding='same', return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
                   padding='same', return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
                   padding='same', return_sequences=True))
model.add(BatchNormalization())
model.add(Conv3D(filters=1, kernel_size=(3, 3, 3),
               activation='sigmoid',
               padding='same', data_format='channels_last'))
model.compile(loss='binary_crossentropy', optimizer='adadelta')
print(model.summary())
# fit with first 3 videos because I don't have the horsepower yet
history = model.fit(x_learn[:3], y_learn[:3],
              batch_size=batch_size,
              epochs=epochs)
print (history)
Results:
Layer (type)                 Output Shape              Param #   
=================================================================
conv_lst_m2d_5 (ConvLSTM2D)  (None, 130, 240, 320, 40) 62080     
_________________________________________________________________
batch_normalization_5 (Batch (None, 130, 240, 320, 40) 160       
_________________________________________________________________
conv_lst_m2d_6 (ConvLSTM2D)  (None, 130, 240, 320, 40) 115360    
_________________________________________________________________
batch_normalization_6 (Batch (None, 130, 240, 320, 40) 160       
_________________________________________________________________
conv_lst_m2d_7 (ConvLSTM2D)  (None, 130, 240, 320, 40) 115360    
_________________________________________________________________
batch_normalization_7 (Batch (None, 130, 240, 320, 40) 160       
_________________________________________________________________
conv_lst_m2d_8 (ConvLSTM2D)  (None, 130, 240, 320, 40) 115360    
_________________________________________________________________
batch_normalization_8 (Batch (None, 130, 240, 320, 40) 160       
_________________________________________________________________
conv3d_1 (Conv3D)            (None, 130, 240, 320, 1)  1081      
=================================================================
Total params: 409,881.0
Trainable params: 409,561
Non-trainable params: 320.0
_________________________________________________________________
None
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-d909d285f474> in <module>()
     82 history = model.fit(x_learn[:3], y_learn[:3],
     83               batch_size=batch_size,
---> 84               epochs=epochs)
     85 
     86 print (history)
ValueError: Error when checking model target: expected conv3d_1 to have 5 dimensions, but got array with shape (3, 1)
"Target" means that the problem is in the output of your model versus the format of y_learn.
The array y_learn should be exactly the same shape of the model's output, because the model outputs a "guess", while y_learn is the "correct answer". The system can only compare the guess with the correct answer if they have the same dimensions.
See the difference:
(None,130,240,320,1)   (None,1)
Where "None" is the batch size. You gave y_learn[:3], then your batch size is 3 for this training session.
In order to correct it properly, we need to understand what y_learn is.
If I understood well, you've got only a number, 0 or 1, for each video. If that's so, your y_learn is totally ok, and what you need is for your model to output things like (None,1).
A very simple way to do that (perhaps not the best, and I couldn't be of more help here...) is to add a final Dense layer with just one neuron:
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
Now, when you do model.summary(), you will see the final output as (None,1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With