Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recurrent neural networks for Time Series with Multiple Variables - TensorFlow

I'm using previous demand to predict future demand, using 3 variables, but whenever I run the code my Y axis shows error

If I use only one variable on the Y axis separately it has no error.

Example:

demandaY = bike_data[['cnt']]
n_steps = 20

for time_step in range(1, n_steps+1):
    demandaY['cnt'+str(time_step)] = demandaY[['cnt']].shift(-time_step).values

y = demandaY.iloc[:, 1:].values
y = np.reshape(y, (y.shape[0], n_steps, 1))

DATASET

enter image description here

SCRIPT

features = ['cnt','temp','hum']
demanda = bike_data[features]
n_steps = 20

for var_col in features:
    for time_step in range(1, n_steps+1):
        demanda[var_col+str(time_step)] = demanda[[var_col]].shift(-time_step).values

demanda.dropna(inplace=True)
demanda.head()

n_var = len(features)
columns = list(filter(lambda col: not(col.endswith("%d" % n_steps)), demanda.columns))

X = demanda[columns].iloc[:, :(n_steps*n_var)].values
X = np.reshape(X, (X.shape[0], n_steps, n_var))

y = demanda.iloc[:, 0].values
y = np.reshape(y, (y.shape[0], n_steps, 1))

OUTPUT

ValueError: cannot reshape array of size 17379 into shape (17379,20,1)

GitHub: repository

like image 203
Luis Henrique Avatar asked Oct 28 '22 06:10

Luis Henrique


1 Answers

Not clear if the OP still wants the answer but I will post the answer I linked in the comment with a few modifications.

Timeseries datasets can be of different types, lets consider a dataset which has X as features and Y as labels. Depending on the problem Y might be a sample from X shifted in time or can also be another target variable you want to predict.

def create_dataset(X,Y, look_back=10, label_lag = -1, stride = 1):

    dataX, dataY = [], []

    for i in range(0,(len(X)-look_back + 1),stride):
        a = X[i:(i+look_back)]
        dataX.append(a)
        b = Y[i + look_back + label_lag]
        dataY.append(b)
    return np.array(dataX), np.array(dataY)

print(features.values.shape,labels.shape)
#(619,4), (619,1)

x,y = create_dataset(X=features.values,Y=labels.values,look_back=10,stride=1)
(x.shape,y.shape)
#(610, 10, 4), (610, 1)

Use of other parameters :

  1. label_lag : if X samples are at time t, Y samples will be at time t+label_lag. The default value will put both X and Yat same index t.

the indices of 1st sample of X and Y:

if label_lag is -1:
np.where(x[1,-1]==features.values)[0],np.where(y[1] == labels.values)[0]
#(10,10,10,10), (10)

if label_lag is 0:
np.where(x[1,-1]==features.values)[0],np.where(y[1] == labels.values)[0]
#(10,10,10,10), (11)
  1. look_back: this is the number of samples of past history of your dataset from your current timestep t. look_back of 10 means there will be samples from t-10 to t in one single sample.

  2. stride : the index gap between two consecutive samples. When stride=2, If 1st sample of X has rows from index 0 to 10 then 2nd sample will have rows from the index 2 to 12.

Furthermore, you can also have a lookback in Y depending on your current problem and Y can also be multi-dimensional. In that case the change is only this b=Y[i:(i+look_back+label_lag)].

The same functionality can be achieved by TimeseriesGenerator from keras.

TimeseriesGenerator(features.values,labels.values,length=10,batch_size=64,stride=1)

where length is same as look_back. By default there is a gap in features and labels by 1, i.e. a sample in X will be from t-10 to t and corresponding sample in Y will be at index t+1. If you want both at same indices just shiftthe labels by one before passing in the generator.

like image 100
Siddhant Tandon Avatar answered Nov 09 '22 14:11

Siddhant Tandon