I understand neural networks with any number of hidden layers can approximate nonlinear functions, however, can it approximate: <pre class="prettyprint"><code>f(x) = x^2 </code></pre> I can't think of how it could. It seems like a very obvious limitation of neural networks that can potentially limit what it can do. For example, because of this limitation, neural networks probably can't properly approximate many functions used in statistics like Exponential Moving Average, or even variance. Speaking of moving average, can recurrent neural networks properly approximate that? I understand how a feedforward neural network or even a single linear neuron can output a moving average using the sliding window technique, but how would recurrent neural networks do it without X amount of hidden layers (X being the moving average size)? Also, let us assume we don't know the original function f, which happens to get the average of the last 500 inputs, and then output a 1 if it's higher than 3, and 0 if it's not. But for a second, pretend we don't know that, it's a black box. How would a recurrent neural network approximate that? We would first need to know how many timesteps it should have, which we don't. Perhaps a LSTM network could, but even then, what if it's not a simple moving average, it's an exponential moving average? I don't think even LSTM can do it. Even worse still, what if f(x,x1) that we are trying to learn is simply <pre class="prettyprint"><code>f(x,x1) = x * x1 </code></pre> That seems very simple and straightforward. Can a neural network learn it? I don't see how. Am I missing something huge here or are machine learning algorithms extremely limited? Are there other learning techniques besides neural networks that can actually do any of this?

The key point to understand is compact: Neural networks (as any other approximation structure like, polynomials, splines, or Radial Basis Functions) can approximate any continuous function only within a compact set. In other words the theory states that, given: <ol> <li>A continuous function f(x),</li> <li>A finite range for the input x, [a,b], and </li> <li>A desired approximation accuracy ε>0, </li> </ol> then there exists a neural network that approximates f(x) with an approximation error less than ε, everywhere within [a,b]. Regarding your example of f(x) = x2, yes you can approximate it with a neural network within any finite range: [-1,1], [0, 1000], etc. To visualise this, imagine that you approximate f(x) within [-1,1] with a Step Function. Can you do it on paper? Note that if you make the steps narrow enough you can achieve any desired accuracy. The way neural networks approximate f(x) is not much different than this. But again, there is no neural network (or any other approximation structure) with a finite number of parameters that can approximate f(x) = x2 for all x in [-∞, +∞].

Can neural networks approximate any function given enough hidden neurons?

Tags:

machine-learning

neural-network

I understand neural networks with any number of hidden layers can approximate nonlinear functions, however, can it approximate:

f(x) = x^2

I can't think of how it could. It seems like a very obvious limitation of neural networks that can potentially limit what it can do. For example, because of this limitation, neural networks probably can't properly approximate many functions used in statistics like Exponential Moving Average, or even variance.

Speaking of moving average, can recurrent neural networks properly approximate that? I understand how a feedforward neural network or even a single linear neuron can output a moving average using the sliding window technique, but how would recurrent neural networks do it without X amount of hidden layers (X being the moving average size)?

Also, let us assume we don't know the original function f, which happens to get the average of the last 500 inputs, and then output a 1 if it's higher than 3, and 0 if it's not. But for a second, pretend we don't know that, it's a black box.

How would a recurrent neural network approximate that? We would first need to know how many timesteps it should have, which we don't. Perhaps a LSTM network could, but even then, what if it's not a simple moving average, it's an exponential moving average? I don't think even LSTM can do it.

Even worse still, what if f(x,x1) that we are trying to learn is simply

f(x,x1) = x * x1

That seems very simple and straightforward. Can a neural network learn it? I don't see how.

Am I missing something huge here or are machine learning algorithms extremely limited? Are there other learning techniques besides neural networks that can actually do any of this?

313

asked Sep 01 '14 15:09

Essam Al-Mansouri

Video Answer

3 Answers

The key point to understand is compact:

Neural networks (as any other approximation structure like, polynomials, splines, or Radial Basis Functions) can approximate any continuous function only within a compact set.

In other words the theory states that, given:

A continuous function f(x),
A finite range for the input x, [a,b], and
A desired approximation accuracy ε>0,

then there exists a neural network that approximates f(x) with an approximation error less than ε, everywhere within [a,b].

Regarding your example of f(x) = x², yes you can approximate it with a neural network within any finite range: [-1,1], [0, 1000], etc. To visualise this, imagine that you approximate f(x) within [-1,1] with a Step Function. Can you do it on paper? Note that if you make the steps narrow enough you can achieve any desired accuracy. The way neural networks approximate f(x) is not much different than this.

But again, there is no neural network (or any other approximation structure) with a finite number of parameters that can approximate f(x) = x² for all x in [-∞, +∞].

answered Oct 19 '22 00:10

Panagiotis Panagi

The question is very legitimate and unfortunately many of the answers show how little practitioners seem to know about the theory of neural networks. The only rigorous theorem that exists about the ability of neural networks to approximate different kinds of functions is the Universal Approximation Theorem.

The UAT states that any continuous function on a compact domain can be approximated by a neural network with only one hidden layer provided the activation functions used are BOUNDED, continuous and monotonically increasing. Now, a finite sum of bounded functions is bounded by definition.

A polynomial is not bounded so the best we can do is provide a neural network approximation of that polynomial over a compact subset of R^n. Outside of this compact subset, the approximation will fail miserably as the polynomial will grow without bound. In other words, the neural network will work well on the training set but will not generalize!

The question is neither off-topic nor does it represent the OP's opinion.

answered Oct 18 '22 23:10

Tarek Nassar

I am not sure why there is such a visceral reaction, I think it is a legitimate question that is hard to find by googling it, even though I think it is widely appreciated and repeated outloud. I think in this case you are looking for the actually citations showing that a neural net can approximate any function. This recent paper explains it nicely, in my opinion. They also cite the original paper by Barron from 1993 that proved a less general result. The conclusion: a two-layer neural network can represent any bounded degree polynomial, under certain (seemingly non-restrictive) conditions.

Just in case the link does not work, it is called "Learning Polynomials with Neural Networks" by Andoni et al., 2014.

answered Oct 18 '22 22:10

Martha White

Related questions
                            
                                Converting LinearSVC's decision function to probabilities (Scikit learn python )
                            
                                cool project to use a genetic algorithm for? [closed]
                            
                                why does scikitlearn says F1 score is ill-defined with FN bigger than 0?
                            
                                Pointers to some good SVM Tutorial [closed]
                            
                                Save Naive Bayes Trained Classifier in NLTK
                            
                                scikit-learn random state in splitting dataset
                            
                                How can I implement incremental training for xgboost?
                            
                                Load S3 Data into AWS SageMaker Notebook
                            
                                Unsupervised Sentiment Analysis
                            
                                How to install TensorFlow on Windows?
                            
                                Understanding Neural Network Backpropagation
                            
                                What is a batch in TensorFlow?
                            
                                In which cases is the cross-entropy preferred over the mean squared error? [closed]
                            
                                How do I solve overfitting in random forest of Python sklearn?
                            
                                How to get mini-batches in pytorch in a clean and efficient way?
                            
                                How to install xgboost package in python (windows platform)?
                            
                                How to predict input image using trained model in Keras?
                            
                                TensorFlow: "Attempting to use uninitialized value" in variable initialization
                            
                                Scikit Learn - K-Means - Elbow - criterion
                            
                                How hard is it to implement a chess engine? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With