Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neural nets as universal approximators

The formal statement of universal approximation theorem states that neural nets with single hidden layer can approximate any function which is continuous on m-dimensional unit hypercube. But how about functions which are not continuous, is anything known about whether they can be always approximated by neural nets?

For example take a function which calculates n'th digit of number pi. If I train some single hidden layer neural net on this data: (n, n'th digit of pi), will it be eventually able to return correct values for unseen n's? How about multiple hidden layers neural nets?

like image 753
Sunny88 Avatar asked Nov 16 '11 23:11

Sunny88


1 Answers

The formal statement of universal approximation theorem states that neural nets with single hidden layer can approximate any function which is continuous on m-dimensional unit hypercube. But how about functions which are not continuous, is anything known about whether they can be always approximated by neural nets?

Yes, most non-continuous functions can be approximated by neural nets. In fact, the function only needs to be measurable since, by Lusin's theorem, any measurable function is continuous on nearly all of its domain. This is good enough for the universal approximation theorem.

Note, however, that the theorem only says that a function can be represented by a neural net. It does not say whether this representation can be learned or that it would be efficient. In fact, for a single-layer net approximating a highly varying function, the size grows exponentially with the function's complexity.

For example take a function which calculates n'th digit of number pi. If I train some single hidden layer neural net on this data: (n, n'th digit of pi), will it be eventually able to return correct values for unseen n's? How about multiple hidden layers neural nets?

No. There is an infinite number of functions returning any subsequence of digits of π. The net would never know which one you want it to learn. Neural nets generalize by taking advantage of function smoothness, but the sequence you want it to learn is not smooth at all.

In other words, you need an exact representation. An approximation is not useful for predicting the digits of π. The universal approximation theorem only guarantees the existence of an approximation.

like image 164
Don Reba Avatar answered Nov 04 '22 09:11

Don Reba