Big difference when using Relu over Tanh on a simple problem

Question

I was testing a toy problem, where you have as input zeros and ones, and the output is whether the number of ones is odd or even (simplicity itself). With a MLP that uses Tanh activation, I never managed to get around random guess performance (~50%)! Just completely by chance, I tried Relu (out of desperation), and...it worked perfectly (getting an accuracy of 100% most of the time).

Then, while discussing it with a friend, we wanted to see what will happen if we replace the zeros with -1 (the task stay the same, odd or even ones). To my sheer surprise, it worked with the Tanh (performance between 75~90 %). Relu still performs better.

The code

import numpy as np
from sklearn.neural_network import MLPClassifier
# from sklearn.preprocessing import StandardScaler
def generate_data(batch_size, data_length=10, zeros=True):
    x = np.random.randint(0, 2, (batch_size, data_length))

    y = x.sum(axis=1) % 2
    y = y.astype(np.int16).reshape(-1)

    if not zeros: # in this case, convert the zeros to -1
        x[x==0] = -1 
    return x, y

# With ReLU, it is perfect!. With Tanh, it is shit
# clf = MLPClassifier(solver='adam', verbose=True, batch_size=512, activation="relu")
clf = MLPClassifier(solver='adam', verbose=True, batch_size=512, activation="tanh")

X_train, y_train = generate_data(batch_size=10000, data_length=10, zeros=True)
X_test, y_test = generate_data(batch_size=1000, data_length=10, zeros=True)

clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))

To get the -1 instead of zeros, just make the zeros parameter False when using generate_data function.

Can someone please explain what is happening here?

Edit: Thanks to @BlackBear and @Andreas K. for there answers. So apparently using Tanh leads the neurons to saturate (the gradient is not moving). With better choice for the learning rate, or to let the network optimize for longer time, it does work. For example, with updating the classifier choices to

clf = MLPClassifier(solver='adam', verbose=True, batch_size=512, activation="tanh", max_iter=5000, learning_rate="adaptive", n_iter_no_change=100)

It always works!

BlackBear · Accepted Answer

It is just an issue with the optimization procedure that is not able to find good values for the weights. You can construct a network with 2^10=1024 neurons in the hidden layer, one for each input pattern, and let the output neuron respond to the neurons corresponding to inputs with even number of ones. With this procedure, you can model every boolean function.

Big difference when using Relu over Tanh on a simple problem

Tags:

python

machine-learning

neural-network

numpy

scikit-learn

OSM

1 Answers

BlackBear

Recent Activity

Donate For Us

Big difference when using Relu over Tanh on a simple problem

Tags:

python

machine-learning

neural-network

numpy

scikit-learn

OSM

1 Answers

BlackBear

Related questions

Recent Activity

Donate For Us