Why not use mean squared error for classification problems?

Tags:

I am trying to solve a simple binary classification problem using LSTM. I am trying to figure out the correct loss function for the network. The issue is, when I use the binary cross-entropy as loss function, the loss value for training and testing is relatively high as compared to using the mean squared error (MSE) function.

Upon research, I came across justifications that binary cross-entropy should be used for classification problems and MSE for the regression problem. However, in my case, I am getting better accuracies and lesser loss value with MSE for binary classification.

I am not sure how to justify these obtained results. Why not use mean squared error for classification problems?

855

asked May 06 '19 23:05

Hussain Ali

2 Answers

I would like to show it using an example. Assume a 6 class classification problem.

Assume, True probabilities = [1, 0, 0, 0, 0, 0]

Case 1: Predicted probabilities = [0.2, 0.16, 0.16, 0.16, 0.16, 0.16]

Case 2: Predicted probabilities = [0.4, 0.5, 0.1, 0, 0, 0]

The MSE in the Case1 and Case 2 is 0.128 and 0.1033 respectively.

Although, Case 1 is correctly predicting class 1 for the instance, the loss in Case 1 is higher than the loss in Case 2.

187

answered Oct 12 '22 10:10

nerd21

Though @nerd21 gives a good example for "MSE as loss function is bad for 6-class classification", it's not the same for binary classification.

Let's just consider binary classification. Label is [1, 0], one prediction is h1=[p, 1-p], another prediction is h2=[q, 1-q], thus their's MSEs are:

L1 = 2*(1-p)^2, L2 = 2*(1-q)^2

Assuming h1 is mis-classifcation, i.e. p<1-p, thus 0<p<0.5 Assuming h2 is correct-classification, i.e. q>1-q, thus 0.5<q<1 Then L1-L2=2(p-q)(p+q-2) > 0 is for sure: p < q is for sure; q + q < 1 + 0.5 < 1.5, thus p + q - 2 < -0.5 < 0; thus L1-L2>0, i.e. L1 > L2

This mean for binary classfication with MSE as loss function, mis-classification will definitely with larger loss that correct-classification.

answered Oct 12 '22 11:10

ChrisZZ

Related questions
                            
                                Tensorflow Object Detection - Convert .pb file to tflite
                            
                                Pickling of a namedtuple instance succeeds normally, but fails when module is Cythonized
                            
                                Most efficient way to sort an array into bins specified by an index array?
                            
                                Finding Teeth of Gear by python opencv
                            
                                Converting an Integer value to base64, and then decoding it to get a plaintext
                            
                                How to assign a value to a column for every row of pandas dataframe? [duplicate]
                            
                                How to run python code on AWS lambda with package dependencies >500MB?
                            
                                How do I Sample each group from a pandas data frame at different rates
                            
                                Why is TimeDistributed not needed in my Keras LSTM?
                            
                                Intelligent Peak Detection Method
                            
                                Class weights vs under/oversampling
                            
                                pandas.DataFrame.shift() fill_value not working
                            
                                Which installer to use for Miniconda with Python 3.6?
                            
                                How to free gpu memory by deleting tensors?
                            
                                Is there any quadratic programming function that can have both lower and upper bounds - Python
                            
                                How to run R script in python using rpy2
                            
                                How to use autocompleteselect widget in a modelform
                            
                                How to make the X-axis time dynamically refresh by using pyqtgraph TimeAxisItem
                            
                                Setting a random seed on TF 2.0
                            
                                Does Python's asyncio lock.acquire maintain order?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why not use mean squared error for classification problems?

Tags:

python

keras

lstm

cross-entropy

mean-square-error

Hussain Ali

People also ask

2 Answers

nerd21

ChrisZZ

Recent Activity

Donate For Us