Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can the sigmoid activation function be used to solve regression problems in Keras?

I have implemented simple neural networks with R but it is my first time doing so with Keras so would appreciate some advice.

I developed a neural network function in Keras to predict car sales (the dataset is available here). CarSales is the dependent variable.

As far as I'm aware, Keras is used to develop a neural network for classification purposes rather than regression. In all the examples I have seen so far, the output is bounded between 0 and 1.

Here is the code I developed, and you will see that I am using the 'sigmoid' function for the output:

from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense
from tensorflow.python.keras.wrappers.scikit_learn import KerasRegressor
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler

import os;
path="C:/Users/cars.csv"
os.chdir(path)
os.getcwd()

#Variables
dataset=np.loadtxt("cars.csv", delimiter=",")
x=dataset[:,0:5]
y=dataset[:,5]
y=np.reshape(y, (-1,1))
scaler = MinMaxScaler()
print(scaler.fit(x))
print(scaler.fit(y))
xscale=scaler.transform(x)
yscale=scaler.transform(y)

model = Sequential()
model.add(Dense(12, input_dim=5, kernel_initializer='normal', activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.summary()

model.compile(loss='mse', optimizer='adam', metrics=['mse','mae','mape','cosine','accuracy'])
model.fit(xscale, yscale, epochs=150, batch_size=50,  verbose=1, validation_split=0.2)

As you can see, I am using the MaxMinScaler to bound the variables, and thus the output, between 0 and 1.

Output

When I generate 150 Epochs, values such as the mean_squared_error and mean_absolute_error are quite low. However, the mean_absolute_percentage_error is quite high - but I suspect that this is not a good metric to use when evaluating a sigmoid output.

Is bounding the output variable between 0 and 1 and then running the model an acceptable way of trying to predict an interval variable using a neural network?

like image 423
empoleon Avatar asked Feb 25 '18 17:02

empoleon


2 Answers

To perform regression using neural network you should use linear activation function in the final output.

Try following code.

model = Sequential()
model.add(Dense(12, input_dim=5, kernel_initializer='normal', activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='linear'))
model.summary()
like image 104
Hemen Ashodia Avatar answered Oct 26 '22 23:10

Hemen Ashodia


Is bounding the output variable between 0 and 1 and then running the model an acceptable way of trying to predict an interval variable using a neural network?

I suppose that can work if you know the range of values that your output can take in advance. It's certainly not common though.

With the following code, you're essentially cheating a bit. You're using all data (training and validation) to compute your bounds for the scaler, whereas only training data should be used.

dataset=np.loadtxt("cars.csv", delimiter=",")
x=dataset[:,0:5]
y=dataset[:,5]
y=np.reshape(y, (-1,1))
scaler = MinMaxScaler()
print(scaler.fit(x))
print(scaler.fit(y))
xscale=scaler.transform(x)
yscale=scaler.transform(y)

If you don't cheat like that though, you may get values in the validation data that exceed your bounds. If you then still use a sigmoid, you'll be unable to make correct predictions (which should lie outside [0, 1] if scaled according to bounds determined by training data only).

It's much more common to simply end with a linear layer for regression tasks, like Hemen suggested.

Your learning process may still benefit from scaling outputs in the training data to [0, 1], but then outputs outside training data could, for example, get mapped to 1.1 if they slightly exceed all values observed in training data.

like image 33
Dennis Soemers Avatar answered Oct 27 '22 01:10

Dennis Soemers