Why Bother With Recurrent Neural Networks For Structured Data?

Tags:

I have been developing feedforward neural networks (FNNs) and recurrent neural networks (RNNs) in Keras with structured data of the shape [instances, time, features], and the performance of FNNs and RNNs has been the same (except that RNNs require more computation time).

I have also simulated tabular data (code below) where I expected a RNN to outperform a FNN because the next value in the series is dependent on the previous value in the series; however, both architectures predict correctly.

With NLP data, I have seen RNNs outperform FNNs, but not with tabular data. Generally, when would one expect a RNN to outperform a FNN with tabular data? Specifically, could someone post simulation code with tabular data demonstrating a RNN outperforming a FNN?

Thank you! If my simulation code is not ideal for my question, please adapt it or share a more ideal one!

from keras import models
from keras import layers

from keras.layers import Dense, LSTM

import numpy as np
import matplotlib.pyplot as plt

Two features were simulated over 10 time steps, where the value of the second feature is dependent on the value of both features in the prior time step.

## Simulate data.

np.random.seed(20180825)

X = np.random.randint(50, 70, size = (11000, 1)) / 100

X = np.concatenate((X, X), axis = 1)

for i in range(10):

    X_next = np.random.randint(50, 70, size = (11000, 1)) / 100

    X = np.concatenate((X, X_next, (0.50 * X[:, -1].reshape(len(X), 1)) 
        + (0.50 * X[:, -2].reshape(len(X), 1))), axis = 1)

print(X.shape)

## Training and validation data.

split = 10000

Y_train = X[:split, -1:].reshape(split, 1)
Y_valid = X[split:, -1:].reshape(len(X) - split, 1)
X_train = X[:split, :-2]
X_valid = X[split:, :-2]

print(X_train.shape)
print(Y_train.shape)
print(X_valid.shape)
print(Y_valid.shape)

FNN:

## FNN model.

# Define model.

network_fnn = models.Sequential()
network_fnn.add(layers.Dense(64, activation = 'relu', input_shape = (X_train.shape[1],)))
network_fnn.add(Dense(1, activation = None))

# Compile model.

network_fnn.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fit model.

history_fnn = network_fnn.fit(X_train, Y_train, epochs = 10, batch_size = 32, verbose = False,
    validation_data = (X_valid, Y_valid))

plt.scatter(Y_train, network_fnn.predict(X_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

plt.scatter(Y_valid, network_fnn.predict(X_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

LSTM:

## LSTM model.

X_lstm_train = X_train.reshape(X_train.shape[0], X_train.shape[1] // 2, 2)
X_lstm_valid = X_valid.reshape(X_valid.shape[0], X_valid.shape[1] // 2, 2)

# Define model.

network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(64, activation = 'relu', input_shape = (X_lstm_train.shape[1], 2)))
network_lstm.add(layers.Dense(1, activation = None))

# Compile model.

network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fit model.

history_lstm = network_lstm.fit(X_lstm_train, Y_train, epochs = 10, batch_size = 32, verbose = False,
    validation_data = (X_lstm_valid, Y_valid))

plt.scatter(Y_train, network_lstm.predict(X_lstm_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

plt.scatter(Y_valid, network_lstm.predict(X_lstm_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

531

asked Aug 25 '18 19:08

from keras import michael

1 Answers

In practice even in NLP you see that RNNs and CNNs are often competitive. Here's a 2017 review paper that shows this in more detail. In theory it might be the case that RNNs can handle the full complexity and sequential nature of language better but in practice the bigger obstacle is usually properly training the network and RNNs are finicky.

Another problem that might have a chance of working would be to look at a problem like the balanced parenthesis problem (either with just parentheses in the strings or parentheses along with other distractor characters). This requires processing the inputs sequentially and tracking some state and might be easier to learn with a LSTM then a FFN.

Update: Some data that looks sequential might not actually have to be treated sequentially. For example even if you provide a sequence of numbers to add since addition is commutative a FFN will do just as well as a RNN. This could also be true of many health problems where the dominating information is not of a sequential nature. Suppose every year a patient's smoking habits are measured. From a behavioral standpoint the trajectory is important but if you're predicting whether the patient will develop lung cancer the prediction will be dominated by just the number of years the patient smoked (maybe restricted to the last 10 years for the FFN).

So you want to make the toy problem more complex and to require taking into account the ordering of the data. Maybe some kind of simulated time series, where you want to predict whether there was a spike in the data, but you don't care about absolute values just about the relative nature of the spike.

Update2

I modified your code to show a case where RNNs perform better. The trick was to use more complex conditional logic that is more naturally modeled in LSTMs than FFNs. The code is below. For 8 columns we see that the FFN trains in 1 minute and reaches a validation loss of 6.3. The LSTM takes 3x longer to train but it's final validation loss is 6x lower at 1.06.

As we increase the number of columns the LSTM has a larger and larger advantage, especially if we added more complicated conditions in. For 16 columns the FFNs validation loss is 19 (and you can more clearly see the training curve as the model isn't able to instantly fit the data). In comparison the LSTM takes 11 times longer to train but has a validation loss of 0.31, 30 times smaller than the FFN! You can play around with even larger matrices to see how far this trend will extend.

from keras import models
from keras import layers

from keras.layers import Dense, LSTM

import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import time

matplotlib.use('Agg')

np.random.seed(20180908)

rows = 20500
cols = 10

# Randomly generate Z
Z = 100*np.random.uniform(0.05, 1.0, size = (rows, cols))

larger = np.max(Z[:, :cols/2], axis=1).reshape((rows, 1))
larger2 = np.max(Z[:, cols/2:], axis=1).reshape((rows, 1))
smaller = np.min((larger, larger2), axis=0)
# Z is now the max of the first half of the array.
Z = np.append(Z, larger, axis=1)
# Z is now the min of the max of each half of the array.
# Z = np.append(Z, smaller, axis=1)

# Combine and shuffle.

#Z = np.concatenate((Z_sum, Z_avg), axis = 0)

np.random.shuffle(Z)

## Training and validation data.

split = 10000

X_train = Z[:split, :-1]
X_valid = Z[split:, :-1]
Y_train = Z[:split, -1:].reshape(split, 1)
Y_valid = Z[split:, -1:].reshape(rows - split, 1)

print(X_train.shape)
print(Y_train.shape)
print(X_valid.shape)
print(Y_valid.shape)

print("Now setting up the FNN")

## FNN model.

tick = time.time()

# Define model.

network_fnn = models.Sequential()
network_fnn.add(layers.Dense(32, activation = 'relu', input_shape = (X_train.shape[1],)))
network_fnn.add(Dense(1, activation = None))

# Compile model.

network_fnn.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fit model.

history_fnn = network_fnn.fit(X_train, Y_train, epochs = 500, batch_size = 128, verbose = False,
    validation_data = (X_valid, Y_valid))

tock = time.time()

print()
print(str('%.2f' % ((tock - tick) / 60)) + ' minutes.')

print("Now evaluating the FNN")

loss_fnn = history_fnn.history['loss']
val_loss_fnn = history_fnn.history['val_loss']
epochs_fnn = range(1, len(loss_fnn) + 1)
print("train loss: ", loss_fnn[-1])
print("validation loss: ", val_loss_fnn[-1])

plt.plot(epochs_fnn, loss_fnn, 'black', label = 'Training Loss')
plt.plot(epochs_fnn, val_loss_fnn, 'red', label = 'Validation Loss')
plt.title('FNN: Training and Validation Loss')
plt.legend()
plt.show()

plt.scatter(Y_train, network_fnn.predict(X_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('training points')
plt.show()

plt.scatter(Y_valid, network_fnn.predict(X_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('valid points')
plt.show()

print("LSTM")

## LSTM model.

X_lstm_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_lstm_valid = X_valid.reshape(X_valid.shape[0], X_valid.shape[1], 1)

tick = time.time()

# Define model.

network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(32, activation = 'relu', input_shape = (X_lstm_train.shape[1], 1)))
network_lstm.add(layers.Dense(1, activation = None))

# Compile model.

network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fit model.

history_lstm = network_lstm.fit(X_lstm_train, Y_train, epochs = 500, batch_size = 128, verbose = False,
    validation_data = (X_lstm_valid, Y_valid))

tock = time.time()

print()
print(str('%.2f' % ((tock - tick) / 60)) + ' minutes.')

print("now eval")

loss_lstm = history_lstm.history['loss']
val_loss_lstm = history_lstm.history['val_loss']
epochs_lstm = range(1, len(loss_lstm) + 1)
print("train loss: ", loss_lstm[-1])
print("validation loss: ", val_loss_lstm[-1])

plt.plot(epochs_lstm, loss_lstm, 'black', label = 'Training Loss')
plt.plot(epochs_lstm, val_loss_lstm, 'red', label = 'Validation Loss')
plt.title('LSTM: Training and Validation Loss')
plt.legend()
plt.show()

plt.scatter(Y_train, network_lstm.predict(X_lstm_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('training')
plt.show()

plt.scatter(Y_valid, network_lstm.predict(X_lstm_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title("validation")
plt.show()

answered Oct 12 '22 08:10

emschorsch

Related questions
                            
                                Using regex to remove comments from source files
                            
                                Paramiko "Unknown Server"
                            
                                Computing N Grams using Python
                            
                                Print series of prime numbers in python
                            
                                Finding matching keys in two large dictionaries and doing it fast
                            
                                list.append or list +=?
                            
                                Django migrate : doesn't create tables
                            
                                Generating unique, ordered Pythagorean triplets
                            
                                Django sort by distance
                            
                                Using Jython through IPython: is readline still an issue?
                            
                                Getting all task IDs from nested chains and chords
                            
                                Is `scipy.misc.comb` faster than an ad-hoc binomial computation?
                            
                                How to use Google Colaboratory server as python interpreter in Python IDE?
                            
                                Accessing stream output from hdfs of MRjob
                            
                                Description of TF Lite's Toco converter args for quantization aware training
                            
                                Embedding Python in MATLAB
                            
                                Multiple questions with Object-Oriented Bokeh [OBSOLETE]
                            
                                Loading SavedModel is a lot slower than loading a tf.train.Saver checkpoint
                            
                                How to re-order units based on their degree of desirable neighborhood ? (in Processing)
                            
                                Decorators on abstract methods

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why Bother With Recurrent Neural Networks For Structured Data?

Tags:

python

keras

recurrent-neural-network

prediction

from keras import michael

People also ask

1 Answers

emschorsch

Recent Activity

Donate For Us