Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to decide the size of layers in Keras' Dense method?

Below is the simple example of multi-class classification task with IRIS data.

import seaborn as sns
import numpy as np
from sklearn.cross_validation import train_test_split
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.regularizers import l2
from keras.utils import np_utils


#np.random.seed(1335)

# Prepare data
iris = sns.load_dataset("iris")
iris.head()
X = iris.values[:, 0:4]
y = iris.values[:, 4]


# Make test and train set
train_X, test_X, train_y, test_y = train_test_split(X, y, train_size=0.5, random_state=0)


################################
# Evaluate Keras Neural Network
################################


# Make ONE-HOT
def one_hot_encode_object_array(arr):
    '''One hot encode a numpy array of objects (e.g. strings)'''
    uniques, ids = np.unique(arr, return_inverse=True)
    return np_utils.to_categorical(ids, len(uniques))

train_y_ohe = one_hot_encode_object_array(train_y)
test_y_ohe = one_hot_encode_object_array(test_y)


model = Sequential()
model.add(Dense(16, input_shape=(4,),
      activation="tanh",
      W_regularizer=l2(0.001)))
model.add(Dropout(0.5))
model.add(Dense(3, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')


# Actual modelling
# If you increase the epoch the accuracy will increase until it drop at
# certain point. Epoch 50 accuracy 0.99, and after that drop to 0.977, with
# epoch 70 
hist = model.fit(train_X, train_y_ohe, verbose=0,   nb_epoch=100,  batch_size=1)


score, accuracy = model.evaluate(test_X, test_y_ohe, batch_size=16, verbose=0)
print("Test fraction correct (NN-Score) = {:.2f}".format(score))
print("Test fraction correct (NN-Accuracy) = {:.2f}".format(accuracy))

My question is how do people usually decide the size of layers? For example based on code above we have:

model.add(Dense(16, input_shape=(4,),
      activation="tanh",
      W_regularizer=l2(0.001)))
model.add(Dense(3, activation='sigmoid'))

Where first parameter of Dense is 16 and second is 3.

  • Why two layers uses two different values for Dense?
  • How do we choose what's the best value for Dense?
like image 522
neversaint Avatar asked Apr 30 '16 03:04

neversaint


People also ask

How many dense layers do I need?

It's depend more on number of classes. For 20 classes 2 layers 512 should be more then enough. If you want to experiment you can try also 2 x 256 and 2 x 1024. Less then 256 may work too, but you may underutilize power of previous conv layers.

How many layers are dense after CNN?

CNN is composed of 2 batch-norm layers, 3 convolutional layers, 2 max-pooling layers, 3 hidden dense layers, 4 dropout layers (used only for the training) and one output layer.

How do you define a dense layer in keras?

Dense layer is the regular deeply connected neural network layer. It is most common and frequently used layer. Dense layer does the below operation on the input and return the output.

What are units in dense layer?

Units are one of the most basic and necessary parameters of the Keras dense layer which defines the size of the output from the dense layer. It must be a positive integer since it represents the dimensionality of the output vector.


1 Answers

Basically it is just trial and error. Those are called hyperparameters and should be tuned on a validation set (split from your original data into train/validation/test).

Tuning just means trying different combinations of parameters and keep the one with the lowest loss value or better accuracy on the validation set, depending on the problem.

There are two basic methods:

  • Grid search: For each parameter, decide a range and steps into that range, like 8 to 64 neurons, in powers of two (8, 16, 32, 64), and try each combination of the parameters. This is obviously requires an exponential number of models to be trained and tested and takes a lot of time.

  • Random search: Do the same but just define a range for each parameter and try a random set of parameters, drawn from an uniform distribution over each range. You can try as many parameters sets you want, for as how long you can. This is just a informed random guess.

Unfortunately there is no other way to tune such parameters. About layers having different number of neurons, that could come from the tuning process, or you can also see it as dimensionality reduction, like a compressed version of the previous layer.

like image 200
Dr. Snoopy Avatar answered Oct 23 '22 04:10

Dr. Snoopy