Why do neural networks work so well?

Tags:

I understand all the computational steps of training a neural network with gradient descent using forwardprop and backprop, but I'm trying to wrap my head around why they work so much better than logistic regression.

For now all I can think of is:

A) the neural network can learn it's own parameters

B) there are many more weights than simple logistic regression thus allowing for more complex hypotheses

Can someone explain why a neural network works so well in general? I am a relative beginner.

827

asked Jul 26 '16 16:07

Danny Liu

3 Answers

Neural Networks can have a large number of free parameters (the weights and biases between interconnected units) and this gives them the flexibility to fit highly complex data (when trained correctly) that other models are too simple to fit. This model complexity brings with it the problems of training such a complex network and ensuring the resultant model generalises to the examples it’s trained on (typically neural networks require large volumes of training data, that other models don't).

Classically logistic regression has been limited to binary classification using a linear classifier (although multi-class classification can easily be achieved with one-vs-all, one-vs-one approaches etc. and there are kernalised variants of logistic regression that allow for non-linear classification tasks). In general therefore, logistic regression is typically applied to more simple, linearly-separable classification tasks, where small amounts of training data are available.

Models such as logistic regression and linear regression can be thought of as simple multi-layer perceptrons (check out this site for one explanation of how).

To conclude, it’s the model complexity that allows neural nets to solve more complex classification tasks, and to have a broader application (particularly when applied to raw data such as image pixel intensities etc.), but their complexity means that large volumes of training data are required and training them can be a difficult task.

103

answered Oct 11 '22 02:10

Mark

Recently Dr. Naftali Tishby's idea of Information Bottleneck to explain the effectiveness of deep neural networks is making the rounds in the academic circles. His video explaining the idea (link below) can be rather dense so I'll try to give the distilled/general form of the core idea to help build intuition

https://www.youtube.com/watch?v=XL07WEc2TRI

To ground your thinking, vizualize the MNIST task of classifying the digit in the image. For this, I am only talking about simple fully-connected neural networks (not Convolutional NN as is typically used for MNIST)

The input to a NN contains information about the output hidden inside of it. Some function is needed to transform the input to the output form. Pretty obvious. The key difference in thinking needed to build better intuition is to think of the input as a signal with "information" in it (I won't go into information theory here). Some of this information is relevant for the task at hand (predicting the output). Think of the output as also a signal with a certain amount of "information". The neural network tries to "successively refine" and compress the input signal's information to match the desired output signal. Think of each layer as cutting away at the unneccessary parts of the input information, and keeping and/or transforming the output information along the way through the network. The fully-connected neural network will transform the input information into a form in the final hidden layer, such that it is linearly separable by the output layer.

This is a very high-level and fundamental interpretation of the NN, and I hope it will help you see it clearer. If there are parts you'd like me to clarify, let me know.

There are other essential pieces in Dr.Tishby's work, such as how minibatch noise helps training, and how the weights of a neural network layer can be seen as doing a random walk within the constraints of the problem. These parts are a little more detailed, and I'd recommend first toying with neural networks and taking a course on Information Theory to help build your understanding.

answered Oct 11 '22 02:10

Sriram Gopalakrishnan

Consider you have a large dataset and you want to build a binary classification model for that, Now you have two options that you have pointed out

Logistic Regression
Neural Networks ( Consider FFN for now )

Each node in a neural network will be associated with an activation function for example let's choose Sigmoid since Logistic regression also uses sigmoid internally to make decision.

Let's see how the decision of logistic regression looks when applied on the data Log reg

See some of the green spots present in the red boundary?

Now let's see the decision boundary of neural network (Forgive me for using a different color) NN boundary

Why this happens? Why does the decision boundary of neural network is so flexible which gives more accurate results than Logistic regression?

or the question you asked is "Why neural networks works so well ?" is because of it's hidden units or hidden layers and their representation power.

Let me put it this way. You have a logistic regression model and a Neural network which has say 100 neurons each of Sigmoid activation. Now each neuron will be equivalent to one logistic regression.

Now assume a hundred logistic units trained together to solve one problem versus one logistic regression model. Because of these hidden layers the decision boundary expands and yields better results.

While you are experimenting you can add more number of neurons and see how the decision boundary is changing. A logistic regression is same as a neural network with single neuron.

The above given is just an example. Neural networks can be trained to get very complex decision boundaries

answered Oct 11 '22 02:10

Sumith

Related questions
                            
                                Port XGBoost model trained in python to another system written in C/C++
                            
                                How to go about searching for a player models in COD with OpenCV
                            
                                Gradient descent algorithm won't converge
                            
                                Identifying verb tenses in python
                            
                                what is f-measure for each class in weka
                            
                                Assertion failed (queryDescriptors.type() == trainDescCollection[0].type()) in knnMatchImpl,
                            
                                Real world examples of Machine Learning? [closed]
                            
                                How can I handle new users/items in model generated by Spark ALS from MLlib?
                            
                                How to optimize MAPE code in Python?
                            
                                scheduled sampling in Tensorflow
                            
                                Number of parameters for Keras SimpleRNN
                            
                                Scoring in Gridsearch CV
                            
                                Find input that maximises output of a neural network using Keras and TensorFlow
                            
                                Python - Pandas, Resample dataset to have balanced classes
                            
                                Understanding Tensorflow control dependencies
                            
                                Measuring the performance of classification algorithm
                            
                                Example for svm feature selection in R
                            
                                How to reuse saved classifier created from explorer(in weka) in eclipse java
                            
                                What is the difference between a decision boundary and a hyperplane?
                            
                                Merging pretrained models in Word2Vec?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why do neural networks work so well?

Tags:

artificial-intelligence

machine-learning

neural-network

Danny Liu

People also ask

3 Answers

Mark

Sriram Gopalakrishnan

Sumith

Recent Activity

Donate For Us