In the paper Girshick, R Fast-RCNN (ICCV 2015), section "3.1 Truncated SVD for faster detection", the author proposes to use SVD trick to reduce the size and computation time of a fully connected layer. Given a trained model (<code>deploy.prototxt</code> and <code>weights.caffemodel</code>), how can I use this trick to replace a fully connected layer with a truncated one?

Some linear-algebra background Singular Value Decomposition (SVD) is a decomposition of any matrix <code>W</code> into three matrices: <pre class="prettyprint"><code>W = U S V* </code></pre> Where <code>U</code> and <code>V</code> are ortho-normal matrices, and <code>S</code> is diagonal with elements in decreasing magnitude on the diagonal. One of the interesting properties of SVD is that it allows to easily approximate <code>W</code> with a lower rank matrix: Suppose you truncate <code>S</code> to have only its <code>k</code> leading elements (instead of all elements on the diagonal) then <pre class="prettyprint"><code>W_app = U S_trunc V* </code></pre> is a rank <code>k</code> approximation of <code>W</code>. Using SVD to approximate a fully connected layer Suppose we have a model <code>deploy_full.prototxt</code> with a fully connected layer <pre class="prettyprint"><code># ... some layers here layer { name: "fc_orig" type: "InnerProduct" bottom: "in" top: "out" inner_product_param { num_output: 1000 # more params... } # some more... } # more layers... </code></pre> Furthermore, we have <code>trained_weights_full.caffemodel</code> - trained parameters for <code>deploy_full.prototxt</code> model. <ol> <li> Copy <code>deploy_full.protoxt</code> to <code>deploy_svd.protoxt</code> and open it in editor of your choice. Replace the fully connected layer with these two layers: <pre class="prettyprint"><code>layer { name: "fc_svd_U" type: "InnerProduct" bottom: "in" # same input top: "svd_interim" inner_product_param { num_output: 20 # approximate with k = 20 rank matrix bias_term: false # more params... } # some more... } # NO activation layer here! layer { name: "fc_svd_V" type: "InnerProduct" bottom: "svd_interim" top: "out" # same output inner_product_param { num_output: 1000 # original number of outputs # more params... } # some more... } </code></pre> </li> <li> In python, a little net surgery: <pre class="prettyprint"><code>import caffe import numpy as np orig_net = caffe.Net('deploy_full.prototxt', 'trained_weights_full.caffemodel', caffe.TEST) svd_net = caffe.Net('deploy_svd.prototxt', 'trained_weights_full.caffemodel', caffe.TEST) # get the original weight matrix W = np.array( orig_net.params['fc_orig'][0].data ) # SVD decomposition k = 20 # same as num_ouput of fc_svd_U U, s, V = np.linalg.svd(W) S = np.zeros((U.shape[0], k), dtype='f4') S[:k,:k] = s[:k] # taking only leading k singular values # assign weight to svd net svd_net.params['fc_svd_U'][0].data[...] = np.dot(U,S) svd_net.params['fc_svd_V'][0].data[...] = V[:k,:] svd_net.params['fc_svd_V'][1].data[...] = orig_net.params['fc_orig'][1].data # same bias # save the new weights svd_net.save('trained_weights_svd.caffemodel') </code></pre> </li> </ol> Now we have <code>deploy_svd.prototxt</code> with <code>trained_weights_svd.caffemodel</code> that approximate the original net with far less multiplications, and weights.

How to reduce a fully-connected (`"InnerProduct"`) layer using truncated SVD

Tags:

machine-learning

neural-network

deep-learning

linear-algebra

caffe

In the paper Girshick, R Fast-RCNN (ICCV 2015), section "3.1 Truncated SVD for faster detection", the author proposes to use SVD trick to reduce the size and computation time of a fully connected layer.

Given a trained model (deploy.prototxt and weights.caffemodel), how can I use this trick to replace a fully connected layer with a truncated one?

282

asked Nov 08 '16 07:11

Shai

1 Answers

Some linear-algebra background
Singular Value Decomposition (SVD) is a decomposition of any matrix W into three matrices:

W = U S V*

Where U and V are ortho-normal matrices, and S is diagonal with elements in decreasing magnitude on the diagonal. One of the interesting properties of SVD is that it allows to easily approximate W with a lower rank matrix: Suppose you truncate S to have only its k leading elements (instead of all elements on the diagonal) then

W_app = U S_trunc V*

is a rank k approximation of W.

Using SVD to approximate a fully connected layer
Suppose we have a model deploy_full.prototxt with a fully connected layer

# ... some layers here
layer {
  name: "fc_orig"
  type: "InnerProduct"
  bottom: "in"
  top: "out"
  inner_product_param {
    num_output: 1000
    # more params...
  }
  # some more...
}
# more layers...

Furthermore, we have trained_weights_full.caffemodel - trained parameters for deploy_full.prototxt model.

Copy deploy_full.protoxt to deploy_svd.protoxt and open it in editor of your choice. Replace the fully connected layer with these two layers:

layer {
  name: "fc_svd_U"
  type: "InnerProduct"
  bottom: "in" # same input
  top: "svd_interim"
  inner_product_param {
    num_output: 20  # approximate with k = 20 rank matrix
    bias_term: false
    # more params...
  }
  # some more...
}
# NO activation layer here!
layer {
  name: "fc_svd_V"
  type: "InnerProduct"
  bottom: "svd_interim"
  top: "out"   # same output
  inner_product_param {
    num_output: 1000  # original number of outputs
    # more params...
  }
  # some more...
}

In python, a little net surgery:

import caffe
import numpy as np

orig_net = caffe.Net('deploy_full.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)
svd_net = caffe.Net('deploy_svd.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)
# get the original weight matrix
W = np.array( orig_net.params['fc_orig'][0].data )
# SVD decomposition
k = 20 # same as num_ouput of fc_svd_U
U, s, V = np.linalg.svd(W)
S = np.zeros((U.shape[0], k), dtype='f4')
S[:k,:k] = s[:k]  # taking only leading k singular values
# assign weight to svd net
svd_net.params['fc_svd_U'][0].data[...] = np.dot(U,S)
svd_net.params['fc_svd_V'][0].data[...] = V[:k,:]
svd_net.params['fc_svd_V'][1].data[...] = orig_net.params['fc_orig'][1].data # same bias
# save the new weights
svd_net.save('trained_weights_svd.caffemodel')

Now we have deploy_svd.prototxt with trained_weights_svd.caffemodel that approximate the original net with far less multiplications, and weights.

answered Nov 15 '22 08:11

Shai

Related questions
                            
                                Tensorflow error "has type list, but expected one of: int, long, float"
                            
                                What are some of the ways to convert NLP to SQL?
                            
                                Is it possible to add TransformedTargetRegressor into a scikit-learn pipeline?
                            
                                How to do GridSearchCV for F1-score in classification problem with scikit-learn?
                            
                                when restoring from a checkpoint, how can I change the data type of the parameters?
                            
                                Negative accuracy score in regression models with Scikit-Learn
                            
                                Batchnorm2d Pytorch - Why pass number of channels to batchnorm?
                            
                                How to choose the number of units for the Dense layer in the Convoluted neural network for a Image classification problem?
                            
                                Stochastic gradient Descent implementation - MATLAB
                            
                                Most mutually distant k elements (clustering?)
                            
                                How to detect if a event/action occurred from a text?
                            
                                scikit-learn, add features to a vectorized set of documents
                            
                                Neural network, is it worth changing learning rate and momentum over time
                            
                                Echo State Network learning Mackey-Glass function, but how?
                            
                                Will larger batch size make computation time less in machine learning?
                            
                                TypeError: 'numpy.float64' object is not iterable Keras
                            
                                Custom kernels for SVM, when to apply them?
                            
                                TensorFlow: Does it only have SGD algorithms? or does it also have others like LBFGS
                            
                                expand MNIST - elastic deformations MATLAB
                            
                                Python - machine learning

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to reduce a fully-connected (`"InnerProduct"`) layer using truncated SVD

Tags:

machine-learning

neural-network

deep-learning

linear-algebra

caffe

Shai

People also ask

1 Answers

Shai

Recent Activity

Donate For Us