Estimating the number of neurons and number of layers of an artificial neural network [closed]

Tags:

I am looking for a method on how to calculate the number of layers and the number of neurons per layer. As input I only have the size of the input vector, the size of the output vector and the size of the training set.

Usually the best net is determined by trying different net topologies and selecting the one with the least error. Unfortunately I cannot do that.

819

asked Jul 27 '10 15:07

ladi

1 Answers

This is a really hard problem.

The more internal structure a network has, the better that network will be at representing complex solutions. On the other hand, too much internal structure is slower, may cause training to diverge, or lead to overfitting -- which would prevent your network from generalizing well to new data.

People have traditionally approached this problem in several different ways:

Try different configurations, see what works best. You can divide your training set into two pieces -- one for training, one for evaluation -- and then train and evaluate different approaches. Unfortunately it sounds like in your case this experimental approach isn't available.
Use a rule of thumb. A lot of people have come up with a lot of guesses as to what works best. Concerning the number of neurons in the hidden layer, people have speculated that (for example) it should (a) be between the input and output layer size, (b) set to something near (inputs+outputs) * 2/3, or (c) never larger than twice the size of the input layer.

The problem with rules of thumb is that they don't always take into account vital pieces of information, like how "difficult" the problem is, what the size of the training and testing sets are, etc. Consequently, these rules are often used as rough starting points for the "let's-try-a-bunch-of-things-and-see-what-works-best" approach.
Use an algorithm that dynamically adjusts the network configuration. Algorithms like Cascade Correlation start with a minimal network, then add hidden nodes during training. This can make your experimental setup a bit simpler, and (in theory) can result in better performance (because you won't accidentally use an inappropriate number of hidden nodes).

There's a lot of research on this subject -- so if you're really interested, there is a lot to read. Check out the citations on this summary, in particular:

Lawrence, S., Giles, C.L., and Tsoi, A.C. (1996), "What size neural network gives optimal generalization? Convergence properties of backpropagation". Technical Report UMIACS-TR-96-22 and CS-TR-3617, Institute for Advanced Computer Studies, University of Maryland, College Park.
Elisseeff, A., and Paugam-Moisy, H. (1997), "Size of multilayer networks for exact learning: analytic approach". Advances in Neural Information Processing Systems 9, Cambridge, MA: The MIT Press, pp.162-168.

138

answered Sep 20 '22 05:09

Nate Kohl

Related questions
                            
                                Keras model.summary() object to string
                            
                                Higher validation accuracy, than training accurracy using Tensorflow and Keras
                            
                                TensorFlow - regularization with L2 loss, how to apply to all weights, not just last one?
                            
                                What is the difference between Gradient Descent and Newton's Gradient Descent?
                            
                                Different result with roc_auc_score() and auc()
                            
                                SVM - hard or soft margins?
                            
                                Does Any one got "AttributeError: 'str' object has no attribute 'decode' " , while Loading a Keras Saved Model
                            
                                Linear regression analysis with string/categorical features (variables)?
                            
                                Machine learning in OCaml or Haskell?
                            
                                Tensorflow One Hot Encoder?
                            
                                Ways to improve the accuracy of a Naive Bayes Classifier?
                            
                                What is out of bag error in Random Forests? [closed]
                            
                                Pattern recognition in time series [closed]
                            
                                How to get most informative features for scikit-learn classifiers?
                            
                                Mixing categorial and continuous data in Naive Bayes classifier using scikit-learn
                            
                                why gradient descent when we can solve linear regression analytically
                            
                                Adding L1/L2 regularization in PyTorch?
                            
                                What is the difference between labeled and unlabeled data?
                            
                                Instance Normalisation vs Batch normalisation
                            
                                What are the major differences and benefits of Porter and Lancaster Stemming algorithms? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Estimating the number of neurons and number of layers of an artificial neural network [closed]

Tags:

artificial-intelligence

machine-learning

neural-network

deep-learning

ladi

People also ask

1 Answers

Nate Kohl

Recent Activity

Donate For Us