I'm trying to load a pretrained network, and I'm getting the following error
F1101 23:03:41.857909 73 net.cpp:757] Cannot copy param 0 weights from layer 'fc4'; shape mismatch. Source param shape is 512 4096 (2097152); target param shape is 512 256 4 4 (2097152). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.
I notice that 512 x 256 x 4 x 4 == 512 x 4096, so it seems that in saving and reloading the network weights the layer was somehow flattened.
How can I counteract this error?
I'm trying to use the D-CNN pretrained network in this GitHub repository.
I load the network with
import caffe
net = caffe.Net('deploy_D-CNN.prototxt', 'D-CNN.caffemodel', caffe.TEST)
The prototxt file is
name: "D-CNN"
input: "data"
input_dim: 10
input_dim: 3
input_dim: 259
input_dim: 259
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 64
kernel_size: 5
stride: 2
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "norm1"
top: "conv2"
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "fc4"
type: "Convolution"
bottom: "conv3"
top: "fc4"
convolution_param {
num_output: 512
pad: 0
kernel_size: 4
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "fc4"
top: "fc4"
}
layer {
name: "drop4"
type: "Dropout"
bottom: "fc4"
top: "fc4"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "pool5_spm3"
type: "Pooling"
bottom: "fc4"
top: "pool5_spm3"
pooling_param {
pool: MAX
kernel_size: 10
stride: 10
}
}
layer {
name: "pool5_spm3_flatten"
type: "Flatten"
bottom: "pool5_spm3"
top: "pool5_spm3_flatten"
}
layer {
name: "pool5_spm2"
type: "Pooling"
bottom: "fc4"
top: "pool5_spm2"
pooling_param {
pool: MAX
kernel_size: 14
stride: 14
}
}
layer {
name: "pool5_spm2_flatten"
type: "Flatten"
bottom: "pool5_spm2"
top: "pool5_spm2_flatten"
}
layer {
name: "pool5_spm1"
type: "Pooling"
bottom: "fc4"
top: "pool5_spm1"
pooling_param {
pool: MAX
kernel_size: 29
stride: 29
}
}
layer {
name: "pool5_spm1_flatten"
type: "Flatten"
bottom: "pool5_spm1"
top: "pool5_spm1_flatten"
}
layer {
name: "pool5_spm"
type: "Concat"
bottom: "pool5_spm1_flatten"
bottom: "pool5_spm2_flatten"
bottom: "pool5_spm3_flatten"
top: "pool5_spm"
concat_param {
concat_dim: 1
}
}
layer {
name: "fc4_2"
type: "InnerProduct"
bottom: "pool5_spm"
top: "fc4_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 512
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "fc4_2"
top: "fc4_2"
}
layer {
name: "drop4"
type: "Dropout"
bottom: "fc4_2"
top: "fc4_2"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc5"
type: "InnerProduct"
bottom: "fc4_2"
top: "fc5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 19
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "fc5"
top: "prob"
}
It seems like you are taking a pretrained net where layer "fc4"
was a fully connected layer (aka type: "InnerProduct"
layer) and it was "reshaped" into a convolutional layer.
Since both inner product layer and convolutional layer performs roughly the same linear operation on the inputs this change can be made under certain assumptions (see, e.g. here).
As you already identified correctly the weights of the original pre-trained fully connected layer were saved "flattened" w.r.t the shape caffe expects for a convolutional layer.
I think a solution to this problem can be using share_mode: PERMISSIVE
:
layer {
name: "fc4"
type: "Convolution"
bottom: "conv3"
top: "fc4"
convolution_param {
num_output: 512
pad: 0
kernel_size: 4
}
param {
lr_mult: 1
decay_mult: 1
share_mode: PERMISSIVE # should help caffe overcome the shape mismatch
}
param {
lr_mult: 2
decay_mult: 0
share_mode: PERMISSIVE
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With