I am trying to train a Convolutional Neural Network using Sparse autoenconders in order to compute the filters for the convolution layer. I am using UFLDL code in order to construct patches and to train the CNN network. My code is the following:
===========================================================================
imageDim = 30; % image dimension
imageChannels = 3; % number of channels (rgb, so 3)
patchDim = 10; % patch dimension
numPatches = 100000; % number of patches
visibleSize = patchDim * patchDim * imageChannels; % number of input units
outputSize = visibleSize; % number of output units
hiddenSize = 400; % number of hidden units
epsilon = 0.1; % epsilon for ZCA whitening
poolDim = 10; % dimension of pooling region
optTheta = zeros(2*hiddenSize*visibleSize+hiddenSize+visibleSize, 1);
ZCAWhite = zeros(visibleSize, visibleSize);
meanPatch = zeros(visibleSize, 1);
load patches_16_1
===========================================================================
% Display and check to see that the features look good
W = reshape(optTheta(1:visibleSize * hiddenSize), hiddenSize, visibleSize);
b = optTheta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);
displayColorNetwork( (W*ZCAWhite));
stepSize = 100;
assert(mod(hiddenSize, stepSize) == 0, stepSize should divide hiddenSize);
load train.mat % loads numTrainImages, trainImages, trainLabels
load train.mat % loads numTestImages, testImages, testLabels
% size 30x30x3x8862
numTestImages = 8862;
numTrainImages = 8862;
pooledFeaturesTrain = zeros(hiddenSize, numTrainImages, floor((imageDim - patchDim + 1) / poolDim), floor((imageDim - patchDim + 1) / poolDim) );
pooledFeaturesTest = zeros(hiddenSize, numTestImages, ...
floor((imageDim - patchDim + 1) / poolDim), ...
floor((imageDim - patchDim + 1) / poolDim) );
tic();
testImages = trainImages;
for convPart = 1:(hiddenSize / stepSize)
featureStart = (convPart - 1) * stepSize + 1;
featureEnd = convPart * stepSize;
fprintf('Step %d: features %d to %d\n', convPart, featureStart, featureEnd);
Wt = W(featureStart:featureEnd, :);
bt = b(featureStart:featureEnd);
fprintf('Convolving and pooling train images\n');
convolvedFeaturesThis = cnnConvolve(patchDim, stepSize, ...
trainImages, Wt, bt, ZCAWhite, meanPatch);
pooledFeaturesThis = cnnPool(poolDim, convolvedFeaturesThis);
pooledFeaturesTrain(featureStart:featureEnd, :, :, :) = pooledFeaturesThis;
toc();
clear convolvedFeaturesThis pooledFeaturesThis;
fprintf('Convolving and pooling test images\n');
convolvedFeaturesThis = cnnConvolve(patchDim, stepSize, ...
testImages, Wt, bt, ZCAWhite, meanPatch);
pooledFeaturesThis = cnnPool(poolDim, convolvedFeaturesThis);
pooledFeaturesTest(featureStart:featureEnd, :, :, :) = pooledFeaturesThis;
toc();
clear convolvedFeaturesThis pooledFeaturesThis;
end
I have problems calculating the convolution and pooling layers. I am getting pooledFeaturesTrain(featureStart:featureEnd, :, :, :) = pooledFeaturesThis; subscripted assignment dimension mismatch. The pathces have normally calculated and they are:
I am trying to understand what exactly the convPart variable is doing and what pooledFeaturesThis. Secondly I notice that my problem is a mismatch in this line pooledFeaturesTrain(featureStart:featureEnd, :, :, :) = pooledFeaturesThis;
where I got the message that the variables is mismatching. THe size of pooledFeaturesThis is 100x3x2x2 where the size of pooledFeaturesTrain is 400x8862x2x2. What exactly pooledFeaturesTrain represents? Is the 2x2 result for every filter? CnnConvolve could be found here :
EDIT: I have changed a little bit my code and it works. However I a little bit concerned about the comprehension of the code.
Ok so in this line you are setting the pooling region.
poolDim = 10; % dimension of pooling region
This part means that for each kernel in each layer you are taking the image and pooling and area of 10x10 pixels. From your code it looks like you are applying a mean function, which means that it a patch and computes the mean and outputs this in the next layer... aka, takes the image from say 100x100 to 10x10. In your network you are repeating convolution+pooling until you get down to a 2x2 image, based on this output (btw, this is not generally good practice in my experience).
400x8862x2x2
Anyways back to your code. Notice that at the beginning of your training you do the following initialization:
pooledFeaturesTrain = zeros(hiddenSize, numTrainImages, floor((imageDim - patchDim + 1) / poolDim), floor((imageDim - patchDim + 1) / poolDim) );
So your error is quite simple and correct - the size of the matrix which holds the output of the convolution+pooling is not the size of the matrix you initialized.
The question is now how to fix it. I supposed a lazy man's way to fix it is to take out the initialization. It will drastically slow down your code, and is not guaranteed to work if you have more than 1 layer.
I suggest you instead have pooledFeaturesTrain be a struct of 3 dimensional array. So instead of this
pooledFeaturesTrain(featureStart:featureEnd, :, :, :) = pooledFeaturesThis;
you'd do something more along the lines of this:
pooledFeaturesTrain{n}(:, :, :) = pooledFeaturesThis;
where n is the current layer.
CNN nets aren't as easy as they're cracked up to be- and even when they don't crash getting them to train well is a feat. I highly suggest reading up on the theory of CNNs - it will make coding and debugging much easier.
Good luck with it ! :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With