I have a dataframe with panel data, let's say it's time series for 100 different objects: <pre class="prettyprint"><code>object period value 1 1 24 1 2 67 ... 1 1000 56 2 1 59 2 2 46 ... 2 1000 64 3 1 54 ... 100 1 451 100 2 153 ... 100 1000 21 </code></pre> I want to add a new column <code>prev_value</code> that will store previous <code>value</code> for each object: <pre class="prettyprint"><code>object period value prev_value 1 1 24 nan 1 2 67 24 ... 1 99 445 1243 1 1000 56 445 2 1 59 nan 2 2 46 59 ... 2 1000 64 784 3 1 54 nan ... 100 1 451 nan 100 2 153 451 ... 100 1000 21 1121 </code></pre> Can I use .shift() and .groupby() somehow to do that?

Pandas' grouped objects have a <code>groupby.DataFrameGroupBy.shift</code> method, which will shift a specified column in each group n <code>periods</code>, just like the regular dataframe's <code>shift</code> method: <pre class="prettyprint"><code>df['prev_value'] = df.groupby('object')['value'].shift() </code></pre> For the following example dataframe: <pre class="prettyprint"><code>print(df) object period value 0 1 1 24 1 1 2 67 2 1 4 89 3 2 4 5 4 2 23 23 </code></pre> The result would be: <pre class="prettyprint"><code> object period value prev_value 0 1 1 24 NaN 1 1 2 67 24.0 2 1 4 89 67.0 3 2 4 5 NaN 4 2 23 23 5.0 </code></pre>

IFF your DataFrame is already sorted by the grouping keys you can use a single <code>shift</code> on the entire DataFrame and <code>where</code> to <code>NaN</code> the rows that overflow into the next group. For larger DataFrames with many groups this can be a bit faster. <pre class="prettyprint"><code>df['prev_value'] = df['value'].shift().where(df.object.eq(df.object.shift())) object period value prev_value 0 1 1 24 NaN 1 1 2 67 24.0 2 1 4 89 67.0 3 2 4 5 NaN 4 2 23 23 5.0 </code></pre> <hr> Some performance related timings: <pre class="prettyprint"><code>import perfplot import pandas as pd import numpy as np perfplot.show( setup=lambda N: pd.DataFrame({'object': np.repeat(range(N), 5), 'value': np.random.randint(1, 1000, 5*N)}), kernels=[ lambda df: df.groupby('object')['value'].shift(), lambda df: df['value'].shift().where(df.object.eq(df.object.shift())), ], labels=["GroupBy", "Where"], n_range=[2 ** k for k in range(1, 22)], equality_check=lambda x,y: np.allclose(x, y, equal_nan=True), xlabel="# of Groups" ) </code></pre> <img src="https://i.stack.imgur.com/kliie.png" alt="enter image description here">

Use pandas.shift() within a group

Tags:

python

pandas

pandas-groupby

I have a dataframe with panel data, let's say it's time series for 100 different objects:

object  period  value  1       1       24 1       2       67 ... 1       1000    56 2       1       59 2       2       46 ... 2       1000    64 3       1       54 ... 100     1       451 100     2       153 ... 100     1000    21

I want to add a new column prev_value that will store previous value for each object:

object  period  value  prev_value 1       1       24     nan 1       2       67     24 ... 1       99      445    1243 1       1000    56     445 2       1       59     nan 2       2       46     59 ... 2       1000    64     784 3       1       54     nan ... 100     1       451    nan 100     2       153    451 ... 100     1000    21     1121

Can I use .shift() and .groupby() somehow to do that?

536

asked Nov 16 '18 10:11

Alexandr Kapshuk

2 Answers

Pandas' grouped objects have a groupby.DataFrameGroupBy.shift method, which will shift a specified column in each group n periods, just like the regular dataframe's shift method:

df['prev_value'] = df.groupby('object')['value'].shift()

For the following example dataframe:

print(df)       object  period  value 0       1       1     24 1       1       2     67 2       1       4     89 3       2       4      5 4       2      23     23

The result would be:

     object  period  value  prev_value 0       1       1     24         NaN 1       1       2     67        24.0 2       1       4     89        67.0 3       2       4      5         NaN 4       2      23     23         5.0

answered Sep 20 '22 17:09

yatu

IFF your DataFrame is already sorted by the grouping keys you can use a single shift on the entire DataFrame and where to NaN the rows that overflow into the next group. For larger DataFrames with many groups this can be a bit faster.

df['prev_value'] = df['value'].shift().where(df.object.eq(df.object.shift()))     object  period  value  prev_value 0       1       1     24         NaN 1       1       2     67        24.0 2       1       4     89        67.0 3       2       4      5         NaN 4       2      23     23         5.0

Some performance related timings:

import perfplot import pandas as pd import numpy as np  perfplot.show(     setup=lambda N: pd.DataFrame({'object': np.repeat(range(N), 5),                                    'value': np.random.randint(1, 1000, 5*N)}),      kernels=[         lambda df: df.groupby('object')['value'].shift(),         lambda df: df['value'].shift().where(df.object.eq(df.object.shift())),     ],     labels=["GroupBy", "Where"],     n_range=[2 ** k for k in range(1, 22)],     equality_check=lambda x,y: np.allclose(x, y, equal_nan=True),     xlabel="# of Groups" )

enter image description here

answered Sep 20 '22 17:09

ALollz

Related questions
                            
                                How yield catches StopIteration exception?
                            
                                How to use Kalman filter in Python for location data?
                            
                                How to print map object with Python 3? [duplicate]
                            
                                ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series
                            
                                Error when trying to overload an operator "/"
                            
                                How to send a dictionary to a function that accepts **kwargs?
                            
                                How do I use string formatting to show BOTH leading zeros and precision of 3?
                            
                                Efficient Numpy 2D array construction from 1D array
                            
                                Asyncio two loops for different I/O tasks?
                            
                                Dictionary of lists to dataframe
                            
                                Prevent Flask jsonify from sorting the data
                            
                                Install Tensorflow 2.0 in conda enviroment
                            
                                Django urls straight to html template
                            
                                Undo a file readline() operation so file-pointer is back in original state
                            
                                Detect charset and convert to utf-8 in Python? [duplicate]
                            
                                Matplotlib imshow() stretch to "fit width"
                            
                                How to change Tkinter Button state from disabled to normal?
                            
                                Remove characters before and including _ in python 2.7
                            
                                Enable Python to Connect to MySQL via SSH Tunnelling
                            
                                Django - ImproperlyConfigured: Module "django.contrib.auth.middleware"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With