Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LSTM - Predicting the same constant values after a while

I have a variable that i want to forecast until for the next 30 years. Unfortunately i don't have many samples.

df = pd.DataFrame({'FISCAL_YEAR': [1979,1980,1981,1982,1983,  1984,  
1985,  1986,  1987,  1988,  1989,  1990,  1991,  1992,  1993,  1994,  
1995,  1996,
  1997,  1998,  1999,  2000,  2001,  2002,  2003,  2004,  2005,  2006,  
2007,  2008,  2009,  2010,  2011,  2012,  2013,  2014,  2015,  2016,  
2017,  2018,  2019],
 'VALS': [1341.9,  1966.95,  2085.75,  2087.1000000000004,  2760.75,  
3461.4,  3156.3,  3061.8,  2309.8500000000004,  2320.65,  2535.3,  
2964.6000000000004,  2949.75,  2339.55,
  2327.4,  2571.75,  2299.05,  1560.6000000000001,  1370.25,  1301.4,  
1215.0,  5691.6,  6281.55,  6529.950000000001,  17666.100000000002,  
14467.95,  15205.050000000001,  14717.7,  14426.1,  12946.5,
  13000.5,  12761.550000000001,  13076.1,  13444.650000000001,  
13444.650000000001,  13321.800000000001,  13536.45,  13331.25,  
12630.6,  12741.300000000001,  12658.95]})

Here is my code:

def build_model(n_neurons,dropout,s):
    lstm = Sequential()
    if cudnn:
        lstm.add(CuDNNLSTM(n_neurons))
        n_epochs = 200
    else:
        lstm.add(Masking(mask_value=-1,input_shape=(s[1],s[2])))
        lstm.add(LSTM(n_neurons,dropout=dropout))
        n_epochs = 500

    lstm.add(Dense(1))
    #lstm.add(Activation('softmax'))
    lstm.compile(loss='mean_squared_error',optimizer='adam')
    return lstm

def create_df(dfin,fwd,lstmws):
    ''' Input Normalization '''
    idx = dfin.FISCAL_YEAR.values[fwd:]
    dfx = dfin[[varn]].copy()
    dfy = dfin[[varn]].copy()

    # LSTM window - use last lstmws values
    for i in range(0,lstmws-1):
        dfx = dfx.join(dfin[[varn]].shift(-i-1),how='left',rsuffix='{:02d}'.format(i+1))

    dfx = (dfx-vmnx).divide(vmxx-vmnx)
    dfx.fillna(-1,inplace=True) # replace missing values with -1

    dfy = (dfy-vmnx).divide(vmxx-vmnx)
    dfy.fillna(-1,inplace=True) # replace missing values with -1
    return dfx,dfy,idx

def forecast(dfin,dfx,lstm,idx,gapyr=1):
    ''' Model Forecast '''
    xhat = dfx.values
    xhat = xhat.reshape(xhat.shape[0],lstmws,int(xhat.shape[1]/lstmws))
    yhat = lstm.predict(xhat)

    yhat = yhat*(vmxx-vmnx)+vmnx
    dfout = pd.DataFrame(list(zip(idx+gapyr,yhat.reshape(1,-1)[0])),columns=['FISCAL_YEAR',varn])
    dfout = pd.concat([dfin.head(1),dfout],axis=0).reset_index(drop=True)
    #append last prediction to X and use for prediction
    dfin = pd.concat([dfin,dfout.tail(1)],axis=0).reset_index(drop=True)
    return dfin

def lstm_training(dfin,lstmws,fwd,num_years,batchsize=4,cudnn=False,n_neurons=47,dropout=0.05,retrain=False):
    ''' LSTM Parameter '''
    seed(2018)
    set_random_seed(2018)
    gapyr = 1 # Forecast +1 Year

    dfx,dfy,idx = create_df(dfin,fwd,lstmws)

    X,y = dfx.iloc[fwd:-gapyr].values,dfy[fwd+gapyr:].values[:,0]
    X,y = X.reshape(X.shape[0],lstmws,int(X.shape[1]/lstmws)),y.reshape(len(y), 1)

    lstm = build_model(n_neurons,dropout,X.shape)
    ''' LSTM Training Start '''
    if batchsize == 1:
        history_i = 
lstm.fit(X,y,epochs=25,batch_size=batchsize,verbose=0,shuffle=False)
    else:
        history_i = lstm.fit(X,y,epochs=n_epochs,batch_size=batchsize,verbose=0,shuffle=False)

    dfin = forecast(dfin,dfx,lstm,idx)


    lstm.reset_states()
    if not retrain:
        for fwd in range(1,num_years):

            dfx,dfy,idx = create_df(dfin,fwd,lstmws)

            dfin = forecast(dfin,dfx,lstm,idx)

            lstm.reset_states()

    del dfy,X,y,lstm
    gc.collect();
clear_session();
return dfin,history_i

varn = "VALS"
#LSTM-window
lstmws = 10
vmnx,vmxx = df[varn].astype(float).min(),df[varn].astype(float).max()
dfin,history_i = lstm_training(dfin,lstmws,0,2051-2018)

In my first version i retrained the model every time after appending the new prediction, and the predictions never converged. But because it's very time consuming to train after every new observation, i had to change.

My result:

dfin.VALS.values
array([  1341.9       ,   1966.95      ,   2085.75      ,   2087.1       ,
     2760.75      ,   3461.4       ,   3156.3       ,   3061.8       ,
     2309.85      ,   2320.65      ,   2535.3       ,   2964.6       ,
     2949.75      ,   2339.55      ,   2327.4       ,   2571.75      ,
     2299.05      ,   1560.6       ,   1370.25      ,   1301.4       ,
     1215.        ,   5691.6       ,   6281.55      ,   6529.95      ,
    17666.1       ,  14467.95      ,  15205.05      ,  14717.7       ,
    14426.1       ,  12946.5       ,  13000.5       ,  12761.55      ,
    13076.1       ,  13444.65      ,  13444.65      ,  13321.8       ,
    13536.45      ,  13331.25      ,  12630.6       ,  12741.3       ,
    12658.95      ,  10345.97167969,  12192.12792969,  13074.4296875 ,
    13264.40917969,  12956.1796875 ,  12354.1953125 ,  11659.03125   ,
    11044.06933594,  10643.19921875,  10552.52246094,  10552.52246094,
    10552.52246094,  10552.52246094,  10552.52246094,  10552.52246094,
    10552.52246094,  10552.52246094,  10552.52246094,  10552.52246094,
    10552.52246094,  10552.52246094,  10552.52246094,  10552.52246094,
    10552.52246094,  10552.52246094,  10552.52246094,  10552.52246094,
    10552.52246094,  10552.52246094,  10552.52246094,  10552.52246094,
    10552.52246094,  10552.52246094])

How can I avoid to get the same prediction for the last 20+ years?

EDIT:

I prepended more random data to see if it is because of the little sample size, but the predictions are again constant after a while.

df0 = pd.DataFrame([range(1900,1979),list(np.random.rand(1979-1900)*(vmxx-vmnx)+vmnx)],index=["FISCAL_YEAR","VALS"]).T
df = pd.concat([df0,df])
df["FISCAL_YEAR"] = df["FISCAL_YEAR"].astype(int)
df.index = range(1900,2020)

A strange thing I have observed is that the predictions are the same after 10 years, i.e. the window size, but if I increase lstmws to 20, the predictions converge after 20 years:

lstmws = 20

Result:

{'FISCAL_YEAR': [2020,  2021,  2022,  2023,  2024,  2025,  2026,  027,  028,  2029,  2030,  2031,  2032,  2033,  2034,  2035,  2036,  2037,  2038,  039,  2040,  2041,  2042,  2043,  2044,  2045,  2046,  2047,  2048,  2049,  050,  2051,  2052],
 'VALS': [11183.32421875,  12388.28125,  13151.013671875,  12543.6796875,  2590.0888671875,  12002.583984375,  11822.8857421875,  11479.6572265625,  1423.1279296875,  11444.5751953125,  11506.60546875,  11563.3173828125,  1595.0029296875,  11599.8955078125,  11586.8037109375,  11571.337890625,  1574.541015625,  11620.7900390625,  11734.2431640625,  11934.216796875,  1934.216796875,  11934.216796875,  11934.216796875,  11934.216796875,  1934.216796875,  11934.216796875,  11934.216796875,  11934.216796875,  1934.216796875,  11934.216796875,  11934.216796875,  11934.216796875,  1934.216796875]}
like image 225
TobSta Avatar asked Nov 03 '18 07:11

TobSta


1 Answers

In My experience with LSTM's (I've been generating dance sequences like this), I've found two things in particular help prevent the model from stagnating and predicting the same output.

Adding Mixture Density Layers

First, it's helpful to use a Mixture Density Network instead of an L2 loss (as you have). Read Christopher Bishop's paper on MDN layers for the details, but basically the L2 loss tries to predict the conditional average of some input's error terms as y. If for one value x you have multiple possible outputs y0, y1, y2, each with some probability (as many complex systems will), you'll want to consider the MDN layer and a negative log likelihood loss. Here is a Keras implementation that I'm using.

Reading your situation a little more closely now, this may not be helpful for your case, as you seem to be predicting a time series for which by definition each x maps to a single y.

Feeding the LSTM longer sequences

Next, I've found it helpful to feed my LSTM n sequence values prior to the one I'm trying to predict. The larger n is, the better the results I've found (though slower the training goes). Many papers I've read use 1024 prior sequence values to predict the next sequence value.

You don't have many observations, but you could try feeding the prior 8 observations in to predict the next observation.

Ensuring output data has the same structure as training data

Finally, I've ended up here after several years because I was training a model with a categorical crossentropy loss and one hot vectors as input. When I was generating sequences with my trained model, I was using:

# this predicts the same value over and over
predict_length = 100
sequence = X[0]
for i in range(predict_length):
  # note that z is a dense vector -- it needs to be converted to one hot!
  z = model.predict( np.expand_dims( sequence[-sequence_length:], 0 ) )
  sequence = np.vstack([sequence, z])

I should have been converting my output predictions to one hot vectors:

# this predicts new values :)
predict_length = 1000
sequence = X[0]
for i in range(predict_length):
  # z is still a dense vector; we'll convert it to one-hot below
  z = model.predict( np.expand_dims( sequence[-sequence_length:], 0 ) ).squeeze()
  # let's convert z to a one hot vector to match the training data
  prediction = np.zeros(len(types),)
  prediction[ np.argmax(z) ] = 1
  sequence = np.vstack([sequence, prediction])

I suspect this last step is the reason most people will end up at this thread!

like image 108
duhaime Avatar answered Oct 14 '22 15:10

duhaime