Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Pytorch softmax: What dimension to use?




The function torch.nn.functional.softmax takes two parameters: input and dim. According to its documentation, the softmax operation is applied to all slices of input along the specified dim, and will rescale them so that the elements lie in the range (0, 1) and sum to 1.

Let input be:

input = torch.randn((3, 4, 5, 6)) 

Suppose I want the following, so that every entry in that array is 1:

sum = torch.sum(input, dim = 3) # sum's size is (3, 4, 5, 1) 

How should I apply softmax?

softmax(input, dim = 0) # Way Number 0 softmax(input, dim = 1) # Way Number 1 softmax(input, dim = 2) # Way Number 2 softmax(input, dim = 3) # Way Number 3 

My intuition tells me that is the last one, but I am not sure. English is not my first language and the use of the word along seemed confusing to me because of that.

I am not very clear on what "along" means, so I will use an example that could clarify things. Suppose we have a tensor of size (s1, s2, s3, s4), and I want this to happen

like image 939
Jadiel de Armas Avatar asked Feb 28 '18 19:02

Jadiel de Armas

People also ask

What does softmax dim =- 1 mean?

softmax(x, dim=-1) The dim argument is required unless your input tensor is a vector. It specifies the axis along which to apply the softmax activation. Passing in dim=-1 applies softmax to the last dimension. So, after you do this, the elements of the last dimension will sum to 1.

Which layer is softmax generally used?

The softmax function is used as the activation function in the output layer of neural network models that predict a multinomial probability distribution. That is, softmax is used as the activation function for multi-class classification problems where class membership is required on more than two class labels.

Is softmax a fully connected layer?

The main purpose of the softmax function is to transform the (unnormalised) output of K units (which is e.g. represented as a vector of K elements) of a fully-connected layer to a probability distribution (a normalised output), which is often represented as a vector of K elements, each of which is between 0 and 1 (a ...

1 Answers

Steven's answer is not correct. See the snapshot below. It is actually the reverse way.

enter image description here

Image transcribed as code:

>>> x = torch.tensor([[1,2],[3,4]],dtype=torch.float) >>> F.softmax(x,dim=0) tensor([[0.1192, 0.1192],         [0.8808, 0.8808]]) >>> F.softmax(x,dim=1) tensor([[0.2689, 0.7311],         [0.2689, 0.7311]]) 
like image 196
sww Avatar answered Oct 13 '22 17:10
