Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do neural networks work so well?

I understand all the computational steps of training a neural network with gradient descent using forwardprop and backprop, but I'm trying to wrap my head around why they work so much better than logistic regression.

For now all I can think of is:

A) the neural network can learn it's own parameters

B) there are many more weights than simple logistic regression thus allowing for more complex hypotheses

Can someone explain why a neural network works so well in general? I am a relative beginner.

like image 827
Danny Liu Avatar asked Jul 26 '16 16:07

Danny Liu


People also ask

Why do deep neural networks work better?

The reason behind the boost in performance from a deeper network, is that a more complex, non-linear function can be learned. Given sufficient training data, this enables the networks to more easily discriminate between different classes.

Why neural network is better than brain?

For one, human brains are far more complex and sophisticated than neural networks. Additionally, human brains are able to learn and adapt much more quickly than neural networks. Finally, human brains are able to generate new ideas and concepts, while neural networks are limited to the data they are given.

Why neural network is better than machine learning?

A Neural Network arranges algorithms in such a way that it can make reliable decisions on its own, whereas a ML Model makes decisions based on what it has learnt from the data. As a result, while Machine Learning models may learn from data, they may need some human interaction in the early stages.

Why are neural networks special?

Neural networks reflect the behavior of the human brain, allowing computer programs to recognize patterns and solve common problems in the fields of AI, machine learning, and deep learning.


3 Answers

Neural Networks can have a large number of free parameters (the weights and biases between interconnected units) and this gives them the flexibility to fit highly complex data (when trained correctly) that other models are too simple to fit. This model complexity brings with it the problems of training such a complex network and ensuring the resultant model generalises to the examples it’s trained on (typically neural networks require large volumes of training data, that other models don't).

Classically logistic regression has been limited to binary classification using a linear classifier (although multi-class classification can easily be achieved with one-vs-all, one-vs-one approaches etc. and there are kernalised variants of logistic regression that allow for non-linear classification tasks). In general therefore, logistic regression is typically applied to more simple, linearly-separable classification tasks, where small amounts of training data are available.

Models such as logistic regression and linear regression can be thought of as simple multi-layer perceptrons (check out this site for one explanation of how).

To conclude, it’s the model complexity that allows neural nets to solve more complex classification tasks, and to have a broader application (particularly when applied to raw data such as image pixel intensities etc.), but their complexity means that large volumes of training data are required and training them can be a difficult task.

like image 103
Mark Avatar answered Oct 11 '22 02:10

Mark


Recently Dr. Naftali Tishby's idea of Information Bottleneck to explain the effectiveness of deep neural networks is making the rounds in the academic circles. His video explaining the idea (link below) can be rather dense so I'll try to give the distilled/general form of the core idea to help build intuition

https://www.youtube.com/watch?v=XL07WEc2TRI

To ground your thinking, vizualize the MNIST task of classifying the digit in the image. For this, I am only talking about simple fully-connected neural networks (not Convolutional NN as is typically used for MNIST)

The input to a NN contains information about the output hidden inside of it. Some function is needed to transform the input to the output form. Pretty obvious. The key difference in thinking needed to build better intuition is to think of the input as a signal with "information" in it (I won't go into information theory here). Some of this information is relevant for the task at hand (predicting the output). Think of the output as also a signal with a certain amount of "information". The neural network tries to "successively refine" and compress the input signal's information to match the desired output signal. Think of each layer as cutting away at the unneccessary parts of the input information, and keeping and/or transforming the output information along the way through the network. The fully-connected neural network will transform the input information into a form in the final hidden layer, such that it is linearly separable by the output layer.

This is a very high-level and fundamental interpretation of the NN, and I hope it will help you see it clearer. If there are parts you'd like me to clarify, let me know.

There are other essential pieces in Dr.Tishby's work, such as how minibatch noise helps training, and how the weights of a neural network layer can be seen as doing a random walk within the constraints of the problem. These parts are a little more detailed, and I'd recommend first toying with neural networks and taking a course on Information Theory to help build your understanding.

like image 43
Sriram Gopalakrishnan Avatar answered Oct 11 '22 02:10

Sriram Gopalakrishnan


Consider you have a large dataset and you want to build a binary classification model for that, Now you have two options that you have pointed out

  • Logistic Regression

  • Neural Networks ( Consider FFN for now )

Each node in a neural network will be associated with an activation function for example let's choose Sigmoid since Logistic regression also uses sigmoid internally to make decision.

Let's see how the decision of logistic regression looks when applied on the data  Log reg

See some of the green spots present in the red boundary?

Now let's see the decision boundary of neural network (Forgive me for using a different color) NN boundary

Why this happens? Why does the decision boundary of neural network is so flexible which gives more accurate results than Logistic regression?

or the question you asked is "Why neural networks works so well ?" is because of it's hidden units or hidden layers and their representation power.

Let me put it this way. You have a logistic regression model and a Neural network which has say 100 neurons each of Sigmoid activation. Now each neuron will be equivalent to one logistic regression.

Now assume a hundred logistic units trained together to solve one problem versus one logistic regression model. Because of these hidden layers the decision boundary expands and yields better results.

While you are experimenting you can add more number of neurons and see how the decision boundary is changing. A logistic regression is same as a neural network with single neuron.

The above given is just an example. Neural networks can be trained to get very complex decision boundaries

like image 3
Sumith Avatar answered Oct 11 '22 02:10

Sumith