Make Multiple Shifted (Lagged) Columns in Pandas

Tags:

I have a time-series DataFrame and I want to replicate each of my 200 features/columns as additional lagged features. So at the moment I have features at time t and want to create features at timestep t-1, t-2 and so on.

I know this is best done with df.shift() but I'm having trouble putting it altogether. I want to also rename the columns to 'feature (t-1)', 'feature (t-2)'.

My pseudo-code attempt would be something like:

lagged_values = [1,2,3,10]
for every lagged_values
    for every column, make a new feature column with df.shift(lagged_values)
    make new column have name 'original col name'+'(t-(lagged_values))'

In the end if I have 200 columns and 4 lagged timesteps I would have a new df with 1,000 features (200 each at t, t-1, t-2, t-3 and t-10).

I have found something similar but it doesn't keep the original column names (renames to var1, var2, etc) as per machine learning mastery. Unfortunately I don't understand it well enough to modify it to my problem.

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
    """
    Frame a time series as a supervised learning dataset.
    Arguments:
        data: Sequence of observations as a list or NumPy array.
        n_in: Number of lag observations as input (X).
        n_out: Number of observations as output (y).
        dropnan: Boolean whether or not to drop rows with NaN values.
    Returns:
        Pandas DataFrame of series framed for supervised learning.
    """
    n_vars = 1 if type(data) is list else data.shape[1]
    df = DataFrame(data)
    cols, names = list(), list()
    # input sequence (t-n, ... t-1)
    for i in range(n_in, 0, -1):
        cols.append(df.shift(i))
        names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
    # forecast sequence (t, t+1, ... t+n)
    for i in range(0, n_out):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
        else:
            names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
    # put it all together
    agg = concat(cols, axis=1)
    agg.columns = names
    # drop rows with NaN values
    if dropnan:
        agg.dropna(inplace=True)
    return agg

701

asked Feb 15 '18 23:02

swifty

1 Answers

You can create the additional columns using a dictionary comprehension and then add them to your dataframe via assign.

df = pd.DataFrame(np.random.randn(5, 2), columns=list('AB'))

lags = range(1, 3)  # Just two lags for demonstration.

>>> df.assign(**{
    f'{col} (t-{lag})': df[col].shift(lag)
    for lag in lags
    for col in df
})
          A         B   A (t-1)   A (t-2)   B (t-1)   B (t-2)
0 -0.773571  1.945746       NaN       NaN       NaN       NaN
1  1.375648  0.058043 -0.773571       NaN  1.945746       NaN
2  0.727642  1.802386  1.375648 -0.773571  0.058043  1.945746
3 -2.427135 -0.780636  0.727642  1.375648  1.802386  0.058043
4  1.542809 -0.620816 -2.427135  0.727642 -0.780636  1.802386

109

answered Sep 30 '22 15:09

Alexander

Related questions
                            
                                Pandas dataframe to_csv - split into multiple output files
                            
                                how to type sudo password when using subprocess.call?
                            
                                How can I ask setup.py to list dependencies?
                            
                                Python3 tkinter set image size
                            
                                Set "secure" attribute for Flask cookies
                            
                                Facing obstacle to install pyodbc and pymssql in ubuntu 16.04
                            
                                Jinja2 reverse a list
                            
                                AttributeError: module 'numpy' has no attribute 'flip'
                            
                                Remove single occurrences of words in vocabulary TF-IDF
                            
                                How can we fetch IAM users, their groups and policies?
                            
                                Extracting dictionary items embedded in a list
                            
                                ERROR:tensorflow:Couldn't understand architecture name ''
                            
                                Find euclidean distance from a point to rows in pandas dataframe
                            
                                Setting variable in Jinja for loop doesn't persist between iterations
                            
                                Python - How to read CSV file retrieved from S3 bucket?
                            
                                how to handle select boxes in django admin with large amount of records
                            
                                Permanent fix for Opencv videocapture
                            
                                Tkinter Grid Dynamic Layout
                            
                                How can I define the order of click sub-commands in "--help"
                            
                                How to install libraries that require compilation on google-colaboratory

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Make Multiple Shifted (Lagged) Columns in Pandas

Tags:

python

pandas

swifty

People also ask

1 Answers

Alexander

Recent Activity

Donate For Us