Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Torch / Lua, which neural network structure for mini-batch training?

I'm still working on implementing the mini-batch gradient update on my siamese neural network. Previously I had an implementation problem, that was correctly solved here.

Now I realized that there was also a mistake in the architecture of my neural network, that is related to my incomplete understanding of the correct implementation.

So far, I've always used a non-minibatch gradient descent approach, in which I was passing the training elements one by one to the gradient update. Now, I want to implement a gradient update through mini-batch, starting say with minibatches made of N=2 elements.

My question is: how should I change the architecture of my siamese neural network to make it able to handle a mini-batch of N=2 elements instead of a single element?

This is the (simplified) architecture of my siamese neural network:

nn.Sequential {
  [input -> (1) -> (2) -> output]
  (1): nn.ParallelTable {
    input
      |`-> (1): nn.Sequential {
      |      [input -> (1) -> (2) -> output]
      |      (1): nn.Linear(6 -> 3)
      |      (2): nn.Linear(3 -> 2)
      |    }
      |`-> (2): nn.Sequential {
      |      [input -> (1) -> (2) -> output]
      |      (1): nn.Linear(6 -> 3)
      |      (2): nn.Linear(3 -> 2)
      |    }
       ... -> output
  }
  (2): nn.CosineDistance
}

I have:

  • 2 identical siamese neural networks (upper and lower)
  • 6 input units
  • 3 hidden units
  • 2 output units
  • cosine distance function that compares the output of the two parallel neural networks

Here's my code:

perceptronUpper= nn.Sequential()
perceptronUpper:add(nn.Linear(input_number, hiddenUnits))
perceptronUpper:add(nn.Linear(hiddenUnits,output_number))
perceptronLower= perceptronUpper:clone('weight', 'gradWeights', 'gradBias', 
'bias')

parallel_table = nn.ParallelTable()
parallel_table:add(perceptronUpper)
parallel_table:add(perceptronLower)

perceptron = nn.Sequential()
perceptron:add(parallel_table)
perceptron:add(nn.CosineDistance())

This architecture works very well if I have a gradient update function that takes 1 element; how should modify it to let it manage a minibatch?

EDIT: I probably should use the nn.Sequencer() class, by modifying the last two lines of my code in:

perceptron:add(nn.Sequencer(parallel_table))
perceptron:add(nn.Sequencer(nn.CosineDistance())).

What do you guys think?

like image 278
DavideChicco.it Avatar asked Oct 31 '22 09:10

DavideChicco.it


1 Answers

Every nn module can work with minibatches. Some work only with minibatches, e.g. (Spatial)BatchNormalization. A module knows how many dimensions its input must contain (let's say D) and if the module receives a D+1 dimensional tensor, it assumes the first dimension to be the batch dimension. For example, take a look at nn.Linear module documentation:

The input tensor given in forward(input) must be either a vector (1D tensor) or matrix (2D tensor). If the input is a matrix, then each row is assumed to be an input sample of given batch.

function table_of_tensors_to_batch(tbl)
    local batch = torch.Tensor(#tbl, unpack(tbl[1]:size():totable()))
    for i = 1, #tbl do
       batch[i] = tbl[i] 
    end
    return batch
end

inputs = {
    torch.Tensor(5):fill(1),
    torch.Tensor(5):fill(2),
    torch.Tensor(5):fill(3),
}
input_batch = table_of_tensors_to_batch(inputs)
linear = nn.Linear(5, 2)
output_batch = linear:forward(input_batch)

print(input_batch)
 1  1  1  1  1
 2  2  2  2  2
 3  3  3  3  3
[torch.DoubleTensor of size 3x5]

print(output_batch)
 0,3128 -1,1384
 0,7382 -2,1815
 1,1637 -3,2247
[torch.DoubleTensor of size 3x2]

Ok, but what about containers (nn.Sequential, nn.Paralel, nn.ParallelTable and others)? Container itself does not deal with an input, it just sends the input (or its corresponding part) to the corresponding module it contains. ParallelTable, for example, simply applies the i-th member module to the i-th input table element. Thus, if you want it to handle a batch, each input[i] (input is a table) must be a tensor with the batch dimension as described above.

input_number = 5
output_number = 2

inputs1 = {
    torch.Tensor(5):fill(1),
    torch.Tensor(5):fill(2),
    torch.Tensor(5):fill(3),
}
inputs2 = {
    torch.Tensor(5):fill(4),
    torch.Tensor(5):fill(5),
    torch.Tensor(5):fill(6),
}
input1_batch = table_of_tensors_to_batch(inputs1)
input2_batch = table_of_tensors_to_batch(inputs2)

input_batch = {input1_batch, input2_batch}
output_batch = perceptron:forward(input_batch)

print(input_batch)
{
  1 : DoubleTensor - size: 3x5
  2 : DoubleTensor - size: 3x5
}
print(output_batch)
 0,6490
 0,9757
 0,9947
[torch.DoubleTensor of size 3]


target_batch = torch.Tensor({1, 0, 1})
criterion = nn.MSECriterion()
err = criterion:forward(output_batch, target_batch)
gradCriterion = criterion:backward(output_batch, target_batch)
perceptron:zeroGradParameters()
perceptron:backward(input_batch, gradCriterion)

Why is there nn.Sequencer then? Can one use it instead? Yes, but it's highly not recommended. Sequencer takes a sequence table and applies the module to each element in the table independently providing no speedup. Besides, it has to make copies of that module, so such "batch mode" is considerably less efficient than online (non-batch) training. Sequencer was designed to be a part of recurrent nets, no point to using it in your case.

like image 162
Alexander Lutsenko Avatar answered Nov 15 '22 05:11

Alexander Lutsenko