I need to find the gradient with regards to the input layer for a single convolutional filter in a convolutional neural network (CNN) as a way to visualize the filters. Given a trained network in the Python interface of Caffe such as the one in this example, how can I then find the gradient of a conv-filter with respect to the data in the input layer? Edit: Based on the answer by cesans, I added the code below. The dimensions of my input layer is <code>[8, 8, 7, 96]</code>. My first conv-layer, <code>conv1</code>, has 11 filters with a size of <code>1x5</code>, resulting in the dimensions <code>[8, 11, 7, 92]</code>. <pre class="prettyprint"><code>net = solver.net diffs = net.backward(diffs=['data', 'conv1']) print diffs.keys() # >> ['conv1', 'data'] print diffs['data'].shape # >> (8, 8, 7, 96) print diffs['conv1'].shape # >> (8, 11, 7, 92) </code></pre> As you can see from the output, the dimensions of the arrays returned by <code>net.backward()</code> are equal to the dimensions of my layers in Caffe. After some testing I've found that this output is the gradients of the loss with regards to respectively the <code>data</code> layer and the <code>conv1</code> layer. However, my question was how to find the gradient of a single conv-filter with respect to the data in the input layer, which is something else. How can I achieve this?

Caffe net juggles two "streams" of numbers. The first is the data "stream": images and labels pushed through the net. As these inputs progress through the net they are converted into high-level representation and eventually into class probabilities vectors (in classification tasks). The second "stream" holds the parameters of the different layers, the weights of the convolutions, the biases etc. These numbers/weights are changed and learned during the train phase of the net. Despite the fundamentally different role these two "streams" play, caffe nonetheless use the same data structure, <code>blob</code>, to store and manage them. However, for each layer there are two different blobs vectors one for each stream. Here's an example that I hope would clarify: <pre class="prettyprint"><code>import caffe solver = caffe.SGDSolver( PATH_TO_SOLVER_PROTOTXT ) net = solver.net </code></pre> If you now look at <pre class="prettyprint"><code>net.blobs </code></pre> You will see a dictionary storing a "caffe blob" object for each layer in the net. Each blob has storing room for both data and gradient <pre class="prettyprint"><code>net.blobs['data'].data.shape # >> (32, 3, 224, 224) net.blobs['data'].diff.shape # >> (32, 3, 224, 224) </code></pre> And for a convolutional layer: <pre class="prettyprint"><code>net.blobs['conv1/7x7_s2'].data.shape # >> (32, 64, 112, 112) net.blobs['conv1/7x7_s2'].diff.shape # >> (32, 64, 112, 112) </code></pre> <code>net.blobs</code> holds the first data stream, it's shape matches that of the input images up to the resulting class probability vector. On the other hand, you can see another member of <code>net</code> <pre class="prettyprint"><code>net.layers </code></pre> This is a caffe vector storing the parameters of the different layers. Looking at the first layer (<code>'data'</code> layer): <pre class="prettyprint"><code>len(net.layers[0].blobs) # >> 0 </code></pre> There are no parameters to store for an input layer. On the other hand, for the first convolutional layer <pre class="prettyprint"><code>len(net.layers[1].blobs) # >> 2 </code></pre> The net stores one blob for the filter weights and another for the constant bias. Here they are <pre class="prettyprint"><code>net.layers[1].blobs[0].data.shape # >> (64, 3, 7, 7) net.layers[1].blobs[1].data.shape # >> (64,) </code></pre> As you can see, this layer performs 7x7 convolutions on 3-channel input image and has 64 such filters. Now, how to get the gradients? well, as you noted <pre class="prettyprint"><code>diffs = net.backward(diffs=['data','conv1/7x7_s2']) </code></pre> Returns the gradients of the data stream. We can verify this by <pre class="prettyprint"><code>np.all( diffs['data'] == net.blobs['data'].diff ) # >> True np.all( diffs['conv1/7x7_s2'] == net.blobs['conv1/7x7_s2'].diff ) # >> True </code></pre> (TL;DR) You want the gradients of the parameters, these are stored in the <code>net.layers</code> with the parameters: <pre class="prettyprint"><code>net.layers[1].blobs[0].diff.shape # >> (64, 3, 7, 7) net.layers[1].blobs[1].diff.shape # >> (64,) </code></pre> <hr> To help you map between the names of the layers and their indices into <code>net.layers</code> vector, you can use <code>net._layer_names</code>. <hr> Update regarding the use of gradients to visualize filter responses: A gradient is normally defined for a scalar function. The loss is a scalar, and therefore you can speak of a gradient of pixel/filter weight with respect to the scalar loss. This gradient is a single number per pixel/filter weight. If you want to get the input that results with maximal activation of a specific internal hidden node, you need an "auxiliary" net which loss is exactly a measure of the activation to the specific hidden node you want to visualize. Once you have this auxiliary net, you can start from an arbitrary input and change this input based on the gradients of the auxilary loss to the input layer: <pre class="prettyprint"><code>update = prev_in + lr * net.blobs['data'].diff </code></pre>

Finding gradient of a Caffe conv-filter with regards to input

Tags:

c++

python

neural-network

deep-learning

caffe

I need to find the gradient with regards to the input layer for a single convolutional filter in a convolutional neural network (CNN) as a way to visualize the filters.
Given a trained network in the Python interface of Caffe such as the one in this example, how can I then find the gradient of a conv-filter with respect to the data in the input layer?

Edit:

Based on the answer by cesans, I added the code below. The dimensions of my input layer is [8, 8, 7, 96]. My first conv-layer, conv1, has 11 filters with a size of 1x5, resulting in the dimensions [8, 11, 7, 92].

net = solver.net diffs = net.backward(diffs=['data', 'conv1']) print diffs.keys() # >> ['conv1', 'data'] print diffs['data'].shape # >> (8, 8, 7, 96) print diffs['conv1'].shape # >> (8, 11, 7, 92)

As you can see from the output, the dimensions of the arrays returned by net.backward() are equal to the dimensions of my layers in Caffe. After some testing I've found that this output is the gradients of the loss with regards to respectively the data layer and the conv1 layer.

However, my question was how to find the gradient of a single conv-filter with respect to the data in the input layer, which is something else. How can I achieve this?

771

asked Jul 09 '15 17:07

pir

2 Answers

You can get the gradients in terms of any layer when you run the backward() pass. Just specify the list of layers when calling the function. To show the gradients in terms of the data layer:

net.forward() diffs = net.backward(diffs=['data', 'conv1'])` data_point = 16 plt.imshow(diffs['data'][data_point].squeeze())

In some cases you may want to force all layers to carry out backward, look at the force_backward parameter of the model.

https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto

answered Sep 30 '22 09:09

cesans

Caffe net juggles two "streams" of numbers.
The first is the data "stream": images and labels pushed through the net. As these inputs progress through the net they are converted into high-level representation and eventually into class probabilities vectors (in classification tasks).
The second "stream" holds the parameters of the different layers, the weights of the convolutions, the biases etc. These numbers/weights are changed and learned during the train phase of the net.

Despite the fundamentally different role these two "streams" play, caffe nonetheless use the same data structure, blob, to store and manage them.
However, for each layer there are two different blobs vectors one for each stream.

Here's an example that I hope would clarify:

import caffe solver = caffe.SGDSolver( PATH_TO_SOLVER_PROTOTXT ) net = solver.net

If you now look at

net.blobs

You will see a dictionary storing a "caffe blob" object for each layer in the net. Each blob has storing room for both data and gradient

net.blobs['data'].data.shape    # >> (32, 3, 224, 224) net.blobs['data'].diff.shape    # >> (32, 3, 224, 224)

And for a convolutional layer:

net.blobs['conv1/7x7_s2'].data.shape    # >> (32, 64, 112, 112) net.blobs['conv1/7x7_s2'].diff.shape    # >> (32, 64, 112, 112)

net.blobs holds the first data stream, it's shape matches that of the input images up to the resulting class probability vector.

On the other hand, you can see another member of net

net.layers

This is a caffe vector storing the parameters of the different layers.
Looking at the first layer ('data' layer):

len(net.layers[0].blobs)    # >> 0

There are no parameters to store for an input layer.
On the other hand, for the first convolutional layer

len(net.layers[1].blobs)    # >> 2

The net stores one blob for the filter weights and another for the constant bias. Here they are

net.layers[1].blobs[0].data.shape  # >> (64, 3, 7, 7) net.layers[1].blobs[1].data.shape  # >> (64,)

As you can see, this layer performs 7x7 convolutions on 3-channel input image and has 64 such filters.

Now, how to get the gradients? well, as you noted

diffs = net.backward(diffs=['data','conv1/7x7_s2'])

Returns the gradients of the data stream. We can verify this by

np.all( diffs['data'] == net.blobs['data'].diff )  # >> True np.all( diffs['conv1/7x7_s2'] == net.blobs['conv1/7x7_s2'].diff )  # >> True

(TL;DR) You want the gradients of the parameters, these are stored in the net.layers with the parameters:

net.layers[1].blobs[0].diff.shape # >> (64, 3, 7, 7) net.layers[1].blobs[1].diff.shape # >> (64,)

To help you map between the names of the layers and their indices into net.layers vector, you can use net._layer_names.

Update regarding the use of gradients to visualize filter responses:
A gradient is normally defined for a scalar function. The loss is a scalar, and therefore you can speak of a gradient of pixel/filter weight with respect to the scalar loss. This gradient is a single number per pixel/filter weight.
If you want to get the input that results with maximal activation of a specific internal hidden node, you need an "auxiliary" net which loss is exactly a measure of the activation to the specific hidden node you want to visualize. Once you have this auxiliary net, you can start from an arbitrary input and change this input based on the gradients of the auxilary loss to the input layer:

update = prev_in + lr * net.blobs['data'].diff

112

answered Sep 30 '22 09:09

Shai

Related questions
                            
                                Python Classes without using def __init__(self)
                            
                                How to install / update package with pipenv without updating the rest of packages
                            
                                How do I propagate C++ exceptions to Python in a SWIG wrapper library?
                            
                                Python: thinking of a module and its variables as a singleton — Clean approach?
                            
                                Python "expected an indented block"
                            
                                How do I use different Python version in venv from standard library? (Not virtualenv!)
                            
                                Why is list(x for x in a) faster for a=[0] than for a=[]?
                            
                                "Boilerplate" code in Python?
                            
                                regexp: match character group or end of line
                            
                                Turn functions with a callback into Python generators?
                            
                                Numpy argmax. How to compute both max and argmax?
                            
                                Matplotlib legend, add items across columns instead of down
                            
                                Python: how to "kill" a class instance/object?
                            
                                double click to open an ipython notebook
                            
                                Is there a python equivalent for RSpec to do TDD?
                            
                                Speed of calculating powers (in python)
                            
                                How can I detect and track people using OpenCV?
                            
                                How to interpret the values returned by numpy.correlate and numpy.corrcoef?
                            
                                How to chain attribute lookups that might return None in Python?
                            
                                argparse default option based on another option

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With