Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convolve two blobs in caffe

In caffe, the convolution layer takes one bottom blob, and convolves it with learned filters (which are initialized using the weight type - "Xavier", "MSRA" etc.). However, my question is whether we can simply convolve two bottom blobs and produce a top blob. What would be the most elegant way of doing this? The purpose of this is: one of the bottom blob will be data and the other one will be a dynamic filter (changing depending on the data) produced by previous layers (I am trying to implement dynamic convolution).

My attempt:

One way which came to my mind was to modify the filler.hpp and assign a bottom blob as a filler matrix itself (instead of "Xavier", "MSRA" etc.). Then I thought the convolution layer would pick up from there. We can set lr = 0 to indicate that the weight initialized by our custom filler should not be changed. However, after I looked at the source code, I still don't know how to do it. On the other hand, I don't want to break the workflow of caffe. I still want conv layers to function normally, if I want them to.

Obviously a more tedious way is to use a combination of Slice, tile and/or Scale layer to literally implement convolution. I think it would work, but it will turn out to be messy. Any other thoughts?

Edit 1:

I wrote a new layer by modifying the convolution layer of caffe. In particular, in src/caffe/layers/conv_layer.cpp, on line 27, it takes the weight defined by the filler and convolves it with the bottom blob. So instead of populating that blob from the filler, I modified the layer such that it now takes two bottoms. One of the bottom directly gets assigned to the filler. Now I had to make some other changes such as:

  1. weight blob has the same value for all the samples. Here it will have a different value for different samples. So I changed line 32 from:
this->forward_cpu_gemm(
    bottom_data + n * this->bottom_dim_, 
    weight, 
    top_data + n * this->top_dim_);

to:

this->forward_cpu_gemm(
    bottom_data + n * bottom[1]->count(1),
    bottom[0]->cpu_data() + n * bottom[0]->count(1), 
    top_data + n * this->top_dim_);

To make things easier, I assumed that there is no bias term involved, stride is always 1, padding can always be 0, group will always be 1 etc. However, when I tested the forward pass, it gave me some weird answer (with a simple convolution kernel = np.ones((1,1,3,3)). The learning rates were set to zero for this kernel so that it doesn't change. However, I can't get a right answer. Any suggestions will be appreciated.

Please do not propose solutions using existing layers such as Slice, Eltwise, Crop. I have already implemented - it works - but it is unbelievably complex and memory inefficient.

like image 770
Autonomous Avatar asked Jul 31 '16 02:07

Autonomous


People also ask

What is blob in Caffe?

As data and derivatives flow through the network in the forward and backward passes Caffe stores, communicates, and manipulates the information as blobs: the blob is the standard array and unified memory interface for the framework. The layer comes next as the foundation of both model and computation.

What is the size of the Blob in a convolution layer?

For a convolution layer with 96 filters of 11 x 11 spatial dimension and 3 inputs the blob is 96 x 3 x 11 x 11. For an inner product / fully-connected layer with 1000 output channels and 1024 input channels the parameter blob is 1000 x 1024.

How to calculate the number/n of the data in a blob?

For example, in a 4D blob, the value at index (n, k, h, w) is physically located at index ( (n * K + k) * H + h) * W + w. Number / N is the batch size of the data. Batch processing achieves better throughput for communication and device processing. For an ImageNet training batch of 256 images N = 256.

How do GPUs work with blobs?

In practice when GPUs are present, one loads data from the disk to a blob in CPU code, calls a device kernel to do GPU computation, and ferries the blob off to the next layer, ignoring low-level details while maintaining a high level of performance.


1 Answers

I think you are on the right way as a whole.

For the "weird" convolution results, I guess the bug most possibly is:

Consider 2D convolution

and suppose bottom[1]'s shape is (num, channels, height, width),

since convolution in caffe is performed as a multiplication of 2 matrix, weight(representing convolution kernels) and col_buffer(reorganized from data to be convolved), and weight is of num_out rows and channels / this->group_ * kernel_h * kernel_w columns, col_buffer is of channels / this->group_ * kernel_h * kernel_w rows and height_out * width_out columns, so as a weight blob of dynamic convolution layer, bottom[0]'s shape should better be (num, num_out, channels/group, kernel_h, kernel_w) to satisfy

bottom[0]->count(1) == num_out * channels / this->group_ * kernel_h * kernel_w

, in which num_out is the number of the dynamic convolution layer's output feature maps.

That means, to make the convolution function

this->forward_cpu_gemm(bottom_data + n * bottom[1]->count(1) 
                     , bottom[0]->cpu_data() + n * bottom[0]->count(1)
                     , top_data + n * this->top_dim_);

work properly, you must make sure that

bottom[0]->shape(0) == bottom[1]->shape(0) == num
bottom[0]->count(1) == num_out * channels / this->group_ * kernel_h * kernel_w

So most possibly the simple convolution kernel of 4-dimension np.ones((1,1,3,3)) you used may not satify the above condition and result in the wrong convolution results.

Hope it's clear and will help you.

########## Update 1, Oct 10th,2016,Beijing time ##########

I add a dynamic convolution layer here but with no unit test yet. This layer doesn't break the workflow of caffe and only change some private members of BaseConvolution class to be protected.

The files involved are:

include/caffe/layers/dyn_conv_layer.hpp,base_conv_layer.hpp
src/caffe/layers/dyn_conv_layer.cpp(cu)

It grows almost the same with the convolution layer in caffe, and the differences mainly are:

  1. Override the function LayerSetUp() to initialize this->kernel_dim_, this->weight_offset_ etc properly for convolution and ignore initializing this->blobs_ used by Convolution layer routinely to contain weight and bias;
  2. Override the function Reshape() to check that the bottom[1] as a kernel container has proper shape for convolution.

Because I have no time to test it, there may be bugs and I will be very glad to see your feedbacks.

########## Update 2, Oct 12th,2016,Beijing time ##########

I updated test case for dynamic convolution just now. The involved file is src/caffe/test/test_dyn_convolution_layer.cpp. It seems to work fine, but maybe need more thorough tests.

You can build this caffe by cd $CAFFE_ROOT/build && ccmake .., cmake -DBUILD_only_tests="dyn_convolution_layer" .. and make runtest to check it.

like image 157
Dale Avatar answered Oct 11 '22 13:10

Dale