I have a dataframe with panel data, let's say it's time series for 100 different objects:
object period value 1 1 24 1 2 67 ... 1 1000 56 2 1 59 2 2 46 ... 2 1000 64 3 1 54 ... 100 1 451 100 2 153 ... 100 1000 21
I want to add a new column prev_value
that will store previous value
for each object:
object period value prev_value 1 1 24 nan 1 2 67 24 ... 1 99 445 1243 1 1000 56 445 2 1 59 nan 2 2 46 59 ... 2 1000 64 784 3 1 54 nan ... 100 1 451 nan 100 2 153 451 ... 100 1000 21 1121
Can I use .shift() and .groupby() somehow to do that?
shift() function Shift index by desired number of periods with an optional time freq. This function takes a scalar parameter called the period, which represents the number of shifts to be made over the desired axis. This function is very helpful when dealing with time-series data.
shift() If you want to shift your column or subtract the column value with the previous row value from the DataFrame, you can do it by using the shift() function. It consists of a scalar parameter called period, which is responsible for showing the number of shifts to be made over the desired axis.
Pandas' groupby() allows us to split data into separate groups to perform computations for better analysis. In this article, you'll learn the “group by” process (split-apply-combine) and how to use Pandas's groupby() function to group data and perform operations.
shift() function to shift the data of the given Series object by -2 periods. Output : Now we will use Series. shift() function to shift the data in the given series object by -2 periods.
Pandas' grouped objects have a groupby.DataFrameGroupBy.shift
method, which will shift a specified column in each group n periods
, just like the regular dataframe's shift
method:
df['prev_value'] = df.groupby('object')['value'].shift()
For the following example dataframe:
print(df) object period value 0 1 1 24 1 1 2 67 2 1 4 89 3 2 4 5 4 2 23 23
The result would be:
object period value prev_value 0 1 1 24 NaN 1 1 2 67 24.0 2 1 4 89 67.0 3 2 4 5 NaN 4 2 23 23 5.0
IFF your DataFrame is already sorted by the grouping keys you can use a single shift
on the entire DataFrame and where
to NaN
the rows that overflow into the next group. For larger DataFrames with many groups this can be a bit faster.
df['prev_value'] = df['value'].shift().where(df.object.eq(df.object.shift())) object period value prev_value 0 1 1 24 NaN 1 1 2 67 24.0 2 1 4 89 67.0 3 2 4 5 NaN 4 2 23 23 5.0
Some performance related timings:
import perfplot import pandas as pd import numpy as np perfplot.show( setup=lambda N: pd.DataFrame({'object': np.repeat(range(N), 5), 'value': np.random.randint(1, 1000, 5*N)}), kernels=[ lambda df: df.groupby('object')['value'].shift(), lambda df: df['value'].shift().where(df.object.eq(df.object.shift())), ], labels=["GroupBy", "Where"], n_range=[2 ** k for k in range(1, 22)], equality_check=lambda x,y: np.allclose(x, y, equal_nan=True), xlabel="# of Groups" )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With