I'm using previous demand to predict future demand, using 3 variables
, but whenever I run the code my Y axis
shows error
If I use only one variable on the Y axis
separately it has no error.
Example:
demandaY = bike_data[['cnt']]
n_steps = 20
for time_step in range(1, n_steps+1):
demandaY['cnt'+str(time_step)] = demandaY[['cnt']].shift(-time_step).values
y = demandaY.iloc[:, 1:].values
y = np.reshape(y, (y.shape[0], n_steps, 1))
DATASET
SCRIPT
features = ['cnt','temp','hum']
demanda = bike_data[features]
n_steps = 20
for var_col in features:
for time_step in range(1, n_steps+1):
demanda[var_col+str(time_step)] = demanda[[var_col]].shift(-time_step).values
demanda.dropna(inplace=True)
demanda.head()
n_var = len(features)
columns = list(filter(lambda col: not(col.endswith("%d" % n_steps)), demanda.columns))
X = demanda[columns].iloc[:, :(n_steps*n_var)].values
X = np.reshape(X, (X.shape[0], n_steps, n_var))
y = demanda.iloc[:, 0].values
y = np.reshape(y, (y.shape[0], n_steps, 1))
OUTPUT
ValueError: cannot reshape array of size 17379 into shape (17379,20,1)
GitHub: repository
Not clear if the OP still wants the answer but I will post the answer I linked in the comment with a few modifications.
Timeseries datasets can be of different types, lets consider a dataset which has X
as features and Y
as labels. Depending on the problem Y
might be a sample from X
shifted in time or can also be another target variable you want to predict.
def create_dataset(X,Y, look_back=10, label_lag = -1, stride = 1):
dataX, dataY = [], []
for i in range(0,(len(X)-look_back + 1),stride):
a = X[i:(i+look_back)]
dataX.append(a)
b = Y[i + look_back + label_lag]
dataY.append(b)
return np.array(dataX), np.array(dataY)
print(features.values.shape,labels.shape)
#(619,4), (619,1)
x,y = create_dataset(X=features.values,Y=labels.values,look_back=10,stride=1)
(x.shape,y.shape)
#(610, 10, 4), (610, 1)
Use of other parameters :
label_lag
: if X
samples are at time t
, Y
samples will be at time t+label_lag
. The default value will put both X
and Y
at same index t
. the indices of 1st sample of X
and Y
:
if label_lag is -1:
np.where(x[1,-1]==features.values)[0],np.where(y[1] == labels.values)[0]
#(10,10,10,10), (10)
if label_lag is 0:
np.where(x[1,-1]==features.values)[0],np.where(y[1] == labels.values)[0]
#(10,10,10,10), (11)
look_back
: this is the number of samples of past history of your dataset from your current timestep t
. look_back of 10 means there will be samples from t-10 to t
in one single sample.
stride
: the index gap between two consecutive samples. When stride=2
, If 1st sample of X
has rows from index 0 to 10
then 2nd sample will have rows from the index 2 to 12
.
Furthermore, you can also have a lookback in Y
depending on your current problem and Y
can also be multi-dimensional. In that case the change is only this b=Y[i:(i+look_back+label_lag)]
.
The same functionality can be achieved by TimeseriesGenerator
from keras
.
TimeseriesGenerator(features.values,labels.values,length=10,batch_size=64,stride=1)
where length
is same as look_back
. By default there is a gap in features
and labels
by 1, i.e. a sample in X
will be from t-10 to t
and corresponding sample in Y
will be at index t+1
. If you want both at same indices just shift
the labels by one before passing in the generator.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With