Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Auto-encoders with tied weights in Caffe

From my understanding, normally an auto-encoder uses tied weights in the encoding and decoding networks right?

I took a look at Caffe's auto-encoder example, but I didn't see how the weights are tied. I noticed that the encoding and decoding networks share the same blobs, but how is it guaranteed that the weights are updated correctly?

How to implement tied weights auto-encoders in Caffe?

like image 634
dontloo Avatar asked Aug 25 '16 02:08

dontloo


1 Answers

While there's a history of using tied weights in auto-encoders, now days it is rarely used (to the best of my knowledge), which I believe is why this Caffe example doesn't use tied weights.

Nevertheless, Caffe does support auto-encoders with tied weights, and it is possilbe using two features: parameter sharing between layers and the transpose flag of the fully-connected layer (InnerProduct in Caffe). More specifically, two parameters are shared in Caffe if their name is the same, which can be specified under the param field like so:

layer {
  name: "encode1"
  type: "InnerProduct"
  bottom: "data"
  top: "encode1"
  param {
    name: "encode1_matrix"
    lr_mult: 1
    decay_mult: 1
  }
  param {
    name: "encode1_bias"
    lr_mult: 1
    decay_mult: 0
  }
  inner_product_param {
    num_output: 128
    weight_filler {
      type: "gaussian"
      std: 1
      sparse: 15
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

If another fully-connected layer (with matching dimensions) used the names "encode1_matrix" and "encode1_bias" then these parameters will always be the same, and Caffe will take care of aggregating gradients and updating the parameters correctly. The second part is using the transpose flag of the fully-connected layer, so that the shared matrix is transposed before multiplication of its input. So, extending the above example, if we wanted to have a fully-connected layer with the same weight matrix as "encode1_matrix" as part of the decoding process, then we will define it like so:

layer {
  name: "decode1"
  type: "InnerProduct"
  bottom: "encode1"
  top: "decode1"
  param {
    name: "encode1_matrix"
    lr_mult: 1
    decay_mult: 1
  }
  param {
    name: "decode1_bias"
    lr_mult: 1
    decay_mult: 0
  }
  inner_product_param {
    num_output: 784
    transpose: true
    weight_filler {
      type: "gaussian"
      std: 1
      sparse: 15
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

Notice that the bias parameters are not shared (cannot be due to different output dimensions), while the matrices are shared and the decoder layer uses the transpose flag which completes the tied auto-encoder architecture.

See here for a complete working example of a tied auto-encoder using Caffe: https://gist.github.com/orsharir/beb479d9ad5d8e389800c47c9ec42840

like image 115
Or Sharir Avatar answered Oct 25 '22 22:10

Or Sharir