Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loss Function & Its Inputs For Binary Classification PyTorch

I'm trying to write a neural Network for binary classification in PyTorch and I'm confused about the loss function.

I see that BCELoss is a common function specifically geared for binary classification. I also see that an output layer of N outputs for N possible classes is standard for general classification. However, for binary classification it seems like it could be either 1 or 2 outputs.

So, should I have 2 outputs (1 for each label) and then convert my 0/1 training labels into [1,0] and [0,1] arrays, or use something like a sigmoid for a single-variable output?

Here are the relevant snippets of code so you can see:

self.outputs = nn.Linear(NETWORK_WIDTH, 2) # 1 or 2 dimensions?


def forward(self, x):
  # other layers omitted
  x = self.outputs(x)           
  return F.log_softmax(x)  # <<< softmax over multiple vars, sigmoid over one, or other?

criterion = nn.BCELoss() # <<< Is this the right function?

net_out = net(data)
loss = criterion(net_out, target) # <<< Should target be an integer label or 1-hot vector?

Thanks in advance.

like image 621
GenTel Avatar asked Dec 05 '18 09:12

GenTel


1 Answers

For binary outputs you can use 1 output unit, so then:

self.outputs = nn.Linear(NETWORK_WIDTH, 1)

Then you use sigmoid activation to map the values of your output unit to a range between 0 and 1 (of course you need to arrange your training data this way too):

def forward(self, x):
    # other layers omitted
    x = self.outputs(x)           
    return torch.sigmoid(x)  

Finally you can use the torch.nn.BCELoss:

criterion = nn.BCELoss()

net_out = net(data)
loss = criterion(net_out, target)

This should work fine for you.

You can also use torch.nn.BCEWithLogitsLoss, this loss function already includes the sigmoid function so you could leave it out in your forward.

If you, want to use 2 output units, this is also possible. But then you need to use torch.nn.CrossEntropyLoss instead of BCELoss. The Softmax activation is already included in this loss function.


Edit: I just want to emphasize that there is a real difference in doing so. Using 2 output units gives you twice as many weights compared to using 1 output unit.. So these two alternatives are not equivalent.

like image 196
MBT Avatar answered Nov 12 '22 14:11

MBT