I am working on predicting the EWMA (exponential weighted moving average) formula on a time series using a simple RNN. Already posted about it here.
While the model converges beautifully using keras-tf (from tensorflow import keras), the exact same code doesn't work using native keras (import keras).
Converging model code (keras-tf):
from tensorflow import keras
import numpy as np
np.random.seed(1337) # for reproducibility
def run_avg(signal, alpha=0.2):
avg_signal = []
avg = np.mean(signal)
for i, sample in enumerate(signal):
if np.isnan(sample) or sample == 0:
sample = avg
avg = (1 - alpha) * avg + alpha * sample
avg_signal.append(avg)
return np.array(avg_signal)
def train():
x = np.random.rand(3000)
y = run_avg(x)
x = np.reshape(x, (-1, 1, 1))
y = np.reshape(y, (-1, 1))
input_layer = keras.layers.Input(batch_shape=(1, 1, 1), dtype='float32')
rnn_layer = keras.layers.SimpleRNN(1, stateful=True, activation=None, name='rnn_layer_1')(input_layer)
model = keras.Model(inputs=input_layer, outputs=rnn_layer)
model.compile(optimizer=keras.optimizers.SGD(lr=0.1), loss='mse')
model.summary()
print(model.get_layer('rnn_layer_1').get_weights())
model.fit(x=x, y=y, batch_size=1, epochs=10, shuffle=False)
print(model.get_layer('rnn_layer_1').get_weights())
train()
Non-converging model code:
from keras import Model
from keras.layers import SimpleRNN, Input
from keras.optimizers import SGD
import numpy as np
np.random.seed(1337) # for reproducibility
def run_avg(signal, alpha=0.2):
avg_signal = []
avg = np.mean(signal)
for i, sample in enumerate(signal):
if np.isnan(sample) or sample == 0:
sample = avg
avg = (1 - alpha) * avg + alpha * sample
avg_signal.append(avg)
return np.array(avg_signal)
def train():
x = np.random.rand(3000)
y = run_avg(x)
x = np.reshape(x, (-1, 1, 1))
y = np.reshape(y, (-1, 1))
input_layer = Input(batch_shape=(1, 1, 1), dtype='float32')
rnn_layer = SimpleRNN(1, stateful=True, activation=None, name='rnn_layer_1')(input_layer)
model = Model(inputs=input_layer, outputs=rnn_layer)
model.compile(optimizer=SGD(lr=0.1), loss='mse')
model.summary()
print(model.get_layer('rnn_layer_1').get_weights())
model.fit(x=x, y=y, batch_size=1, epochs=10, shuffle=False)
print(model.get_layer('rnn_layer_1').get_weights())
train()
While in the tf-keras converging model, the loss minimizes and weights approximate nicely the EWMA formula, in the non-converging model, the loss explodes to nan. The only difference as far as I can tell is the way I import the classes.
I used the same random seed for both implementations. I am working on a Windows pc, Anaconda environment with keras 2.2.4 and tensorflow version 1.13.1 (which includes keras in version 2.2.4-tf).
Any insights on this?
This might be because of difference (1 liner) in implementation of SimpleRNN, between TF Keras and Native Keras.
The Line mentioned below is implemented in TF Keras and is not implemented in Keras.
self.input_spec = [InputSpec(ndim=3)]
One case of this difference is that mentioned by you above.
I want to demonstrate similar case, using Sequential
class of Keras.
Below code works fine for TF Keras:
from tensorflow import keras
import numpy as np
from tensorflow.keras.models import Sequential as Sequential
np.random.seed(1337) # for reproducibility
def run_avg(signal, alpha=0.2):
avg_signal = []
avg = np.mean(signal)
for i, sample in enumerate(signal):
if np.isnan(sample) or sample == 0:
sample = avg
avg = (1 - alpha) * avg + alpha * sample
avg_signal.append(avg)
return np.array(avg_signal)
def train():
x = np.random.rand(3000)
y = run_avg(x)
x = np.reshape(x, (-1, 1, 1))
y = np.reshape(y, (-1, 1))
# SimpleRNN model
model = Sequential()
model.add(keras.layers.Input(batch_shape=(1, 1, 1), dtype='float32'))
model.add(keras.layers.SimpleRNN(1, stateful=True, activation=None, name='rnn_layer_1'))
model.compile(optimizer=keras.optimizers.SGD(lr=0.1), loss='mse')
model.summary()
print(model.get_layer('rnn_layer_1').get_weights())
model.fit(x=x, y=y, batch_size=1, epochs=10, shuffle=False)
print(model.get_layer('rnn_layer_1').get_weights())
train()
But if we run the same using Native Keras, we get the error shown below:
TypeError: The added layer must be an instance of class Layer. Found: Tensor("input_1_1:0", shape=(1, 1, 1), dtype=float32)
If we replace the below line of code
model.add(Input(batch_shape=(1, 1, 1), dtype='float32'))
with the code below,
model.add(Dense(32, batch_input_shape=(1,1,1), dtype='float32'))
even the model
with Keras implementation converges almost similar to TF Keras implementation.
You can refer the below links if you want to understand the difference in implementation from code perspective, in both the cases:
https://github.com/tensorflow/tensorflow/blob/r1.14/tensorflow/python/keras/layers/recurrent.py#L1364-L1375
https://github.com/keras-team/keras/blob/master/keras/layers/recurrent.py#L1082-L1091
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With