Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make Multiple Shifted (Lagged) Columns in Pandas

Tags:

python

pandas

I have a time-series DataFrame and I want to replicate each of my 200 features/columns as additional lagged features. So at the moment I have features at time t and want to create features at timestep t-1, t-2 and so on.

I know this is best done with df.shift() but I'm having trouble putting it altogether. I want to also rename the columns to 'feature (t-1)', 'feature (t-2)'.

My pseudo-code attempt would be something like:

lagged_values = [1,2,3,10]
for every lagged_values
    for every column, make a new feature column with df.shift(lagged_values)
    make new column have name 'original col name'+'(t-(lagged_values))'

In the end if I have 200 columns and 4 lagged timesteps I would have a new df with 1,000 features (200 each at t, t-1, t-2, t-3 and t-10).

I have found something similar but it doesn't keep the original column names (renames to var1, var2, etc) as per machine learning mastery. Unfortunately I don't understand it well enough to modify it to my problem.

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
    """
    Frame a time series as a supervised learning dataset.
    Arguments:
        data: Sequence of observations as a list or NumPy array.
        n_in: Number of lag observations as input (X).
        n_out: Number of observations as output (y).
        dropnan: Boolean whether or not to drop rows with NaN values.
    Returns:
        Pandas DataFrame of series framed for supervised learning.
    """
    n_vars = 1 if type(data) is list else data.shape[1]
    df = DataFrame(data)
    cols, names = list(), list()
    # input sequence (t-n, ... t-1)
    for i in range(n_in, 0, -1):
        cols.append(df.shift(i))
        names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
    # forecast sequence (t, t+1, ... t+n)
    for i in range(0, n_out):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
        else:
            names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
    # put it all together
    agg = concat(cols, axis=1)
    agg.columns = names
    # drop rows with NaN values
    if dropnan:
        agg.dropna(inplace=True)
    return agg
like image 701
swifty Avatar asked Feb 15 '18 23:02

swifty


People also ask

How do I create a lag column in pandas?

You can use the shift() function in pandas to create a column that displays the lagged values of another column. Note that the value in the shift() function indicates the number of values to calculate the lag for.

How do I move two columns in pandas?

shift() If you want to shift your column or subtract the column value with the previous row value from the DataFrame, you can do it by using the shift() function. It consists of a scalar parameter called period, which is responsible for showing the number of shifts to be made over the desired axis.

How do you make a variable lag in Python?

Create lag variables, using the shift function. shift(1) creates a lag of a single record, while shift(5) creates a lag of five records. This creates a lag variable based on the prior observations, but shift can also take a time offset to specify the time to use in shift.

Is there a lag function in pandas?

In Python, the pandas library includes built-in functionalities that allow you to perform different tasks with only a few lines of code. One of these functionalities is the creation of lags and leads of a column. lag shifts a column down by a certain number. lead shifts a column up by a certain number.


1 Answers

You can create the additional columns using a dictionary comprehension and then add them to your dataframe via assign.

df = pd.DataFrame(np.random.randn(5, 2), columns=list('AB'))

lags = range(1, 3)  # Just two lags for demonstration.

>>> df.assign(**{
    f'{col} (t-{lag})': df[col].shift(lag)
    for lag in lags
    for col in df
})
          A         B   A (t-1)   A (t-2)   B (t-1)   B (t-2)
0 -0.773571  1.945746       NaN       NaN       NaN       NaN
1  1.375648  0.058043 -0.773571       NaN  1.945746       NaN
2  0.727642  1.802386  1.375648 -0.773571  0.058043  1.945746
3 -2.427135 -0.780636  0.727642  1.375648  1.802386  0.058043
4  1.542809 -0.620816 -2.427135  0.727642 -0.780636  1.802386
like image 109
Alexander Avatar answered Sep 30 '22 15:09

Alexander