Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Units in Dense layer in Keras

I am trying to understand a concept of ANN architecture in Keras. Number of input neurons in any NN should be equal to the number of features/attributes/columns. So, in the case of having matrix of (20000,100), my input shape should have 100 neurons. In the example on the Keras page, I saw a code:

model = Sequential([Dense(32, input_shape=(784,)),

, which pretty much means that input shape has 784 columns and 32 is the dimensionality of output space, which pretty means that the second layer will have an input of 32. My understanding is that such a significant drop happens because some of the units are not activated due to an activation function. Is my understanding correct?

At the same time, another piece of code, shows that number of input neurons is higher than number of features:

model = Sequential()
model.add(Dense(64, activation='relu', input_dim=20))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

This example is not clear to me. How can it be that the size of units is larger that number of input dimensions?

like image 263
Emin Mammadov Avatar asked Jan 26 '23 06:01

Emin Mammadov


2 Answers

The total number of neurons in a Dense layer is a topic that is still not agreed upon within the machine learning and data science community. There are many heuristics that are used to define this and I refer you to this post on Cross Validated that provides some more details: https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw.

In summary, the number of hidden units between both methods as you have specified most likely originated from repeated experimentation and trial-and-error achieving the best accuracy.

However, for more context the answer to this as I mentioned is through experimentation. 784 for the input neurons most likely comes from the MNIST dataset, which are images that are 28 x 28 = 784. I've seen implementations of neural networks where 32 neurons for the hidden layer is good. Think of each layer as a dimensionality transformation. Even if you go down to 32 dimensions, that doesn't necessarily mean that it will lose accuracy. Also from going from a lower dimensional space to higher dimensional space, that's common if you are trying to map your points to a new space that may be easier for classification.

Finally, in Keras, that number specifies how many neurons are for the current layer. Under the hood, it figures out the weight matrix to satisfy the forward propagation going from the previous layer to the current layer. It would be 785 x 32 in that case with 1 extra neuron for the bias unit.

like image 54
rayryeng Avatar answered Feb 05 '23 02:02

rayryeng


Neural Networks are basicly matrix multiplications, the drop you are talking about in the first part is not due to an Activation function, it's only happen because of the nature of matrix multiplication :

The calcul here is : input * weights = output

so -> [BATCHSIZE, 784] * [784, 32] = [BATCHSIZE, 32] -> output dimension

With that logic we can easily explain how we can have an input shape << size of units, it will give this calcul :

-> [BATCHSIZE, 20] * [20, 64] = [BATCHSIZE, 64] -> output dimension

Hope that helped you !

To learn more :

https://en.wikipedia.org/wiki/Matrix_multiplication

like image 34
Thibault Bacqueyrisses Avatar answered Feb 05 '23 02:02

Thibault Bacqueyrisses