I'm still working on implementing the mini-batch gradient update on my siamese neural network. Previously I had an implementation problem, that was correctly solved here.
Now I realized that there was also a mistake in the architecture of my neural network, that is related to my incomplete understanding of the correct implementation.
So far, I've always used a non-minibatch gradient descent approach, in which I was passing the training elements one by one to the gradient update. Now, I want to implement a gradient update through mini-batch, starting say with minibatches made of N=2 elements.
My question is: how should I change the architecture of my siamese neural network to make it able to handle a mini-batch of N=2 elements instead of a single element?
This is the (simplified) architecture of my siamese neural network:
nn.Sequential {
[input -> (1) -> (2) -> output]
(1): nn.ParallelTable {
input
|`-> (1): nn.Sequential {
| [input -> (1) -> (2) -> output]
| (1): nn.Linear(6 -> 3)
| (2): nn.Linear(3 -> 2)
| }
|`-> (2): nn.Sequential {
| [input -> (1) -> (2) -> output]
| (1): nn.Linear(6 -> 3)
| (2): nn.Linear(3 -> 2)
| }
... -> output
}
(2): nn.CosineDistance
}
I have:
Here's my code:
perceptronUpper= nn.Sequential()
perceptronUpper:add(nn.Linear(input_number, hiddenUnits))
perceptronUpper:add(nn.Linear(hiddenUnits,output_number))
perceptronLower= perceptronUpper:clone('weight', 'gradWeights', 'gradBias',
'bias')
parallel_table = nn.ParallelTable()
parallel_table:add(perceptronUpper)
parallel_table:add(perceptronLower)
perceptron = nn.Sequential()
perceptron:add(parallel_table)
perceptron:add(nn.CosineDistance())
This architecture works very well if I have a gradient update function that takes 1 element; how should modify it to let it manage a minibatch?
EDIT: I probably should use the nn.Sequencer() class, by modifying the last two lines of my code in:
perceptron:add(nn.Sequencer(parallel_table))
perceptron:add(nn.Sequencer(nn.CosineDistance())).
What do you guys think?
Every nn
module can work with minibatches. Some work only with minibatches, e.g. (Spatial)BatchNormalization
. A module knows how many dimensions its input must contain (let's say D) and if the module receives a D+1 dimensional tensor, it assumes the first dimension to be the batch dimension. For example, take a look at nn.Linear
module documentation:
The input tensor given in forward(input) must be either a vector (1D tensor) or matrix (2D tensor). If the input is a matrix, then each row is assumed to be an input sample of given batch.
function table_of_tensors_to_batch(tbl)
local batch = torch.Tensor(#tbl, unpack(tbl[1]:size():totable()))
for i = 1, #tbl do
batch[i] = tbl[i]
end
return batch
end
inputs = {
torch.Tensor(5):fill(1),
torch.Tensor(5):fill(2),
torch.Tensor(5):fill(3),
}
input_batch = table_of_tensors_to_batch(inputs)
linear = nn.Linear(5, 2)
output_batch = linear:forward(input_batch)
print(input_batch)
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
[torch.DoubleTensor of size 3x5]
print(output_batch)
0,3128 -1,1384
0,7382 -2,1815
1,1637 -3,2247
[torch.DoubleTensor of size 3x2]
Ok, but what about containers (nn.Sequential
, nn.Paralel
, nn.ParallelTable
and others)? Container itself does not deal with an input, it just sends the input (or its corresponding part) to the corresponding module it contains. ParallelTable
, for example, simply applies the i-th member module to the i-th input table element. Thus, if you want it to handle a batch, each input[i] (input is a table) must be a tensor with the batch dimension as described above.
input_number = 5
output_number = 2
inputs1 = {
torch.Tensor(5):fill(1),
torch.Tensor(5):fill(2),
torch.Tensor(5):fill(3),
}
inputs2 = {
torch.Tensor(5):fill(4),
torch.Tensor(5):fill(5),
torch.Tensor(5):fill(6),
}
input1_batch = table_of_tensors_to_batch(inputs1)
input2_batch = table_of_tensors_to_batch(inputs2)
input_batch = {input1_batch, input2_batch}
output_batch = perceptron:forward(input_batch)
print(input_batch)
{
1 : DoubleTensor - size: 3x5
2 : DoubleTensor - size: 3x5
}
print(output_batch)
0,6490
0,9757
0,9947
[torch.DoubleTensor of size 3]
target_batch = torch.Tensor({1, 0, 1})
criterion = nn.MSECriterion()
err = criterion:forward(output_batch, target_batch)
gradCriterion = criterion:backward(output_batch, target_batch)
perceptron:zeroGradParameters()
perceptron:backward(input_batch, gradCriterion)
Why is there nn.Sequencer
then? Can one use it instead? Yes, but it's highly not recommended. Sequencer takes a sequence table and applies the module to each element in the table independently providing no speedup. Besides, it has to make copies of that module, so such "batch mode" is considerably less efficient than online (non-batch) training. Sequencer was designed to be a part of recurrent nets, no point to using it in your case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With