<h3>Problem</h3> <p>I want to calculate <code>diff</code> by group. And I don’t know how to sort the <code>time</code> column so that each group results are sorted and positive. </p> <p>The original data :</p> <pre class="prettyprint"><code>In [37]: df Out[37]: id time 0 A 2016-11-25 16:32:17 1 A 2016-11-25 16:36:04 2 A 2016-11-25 16:35:29 3 B 2016-11-25 16:35:24 4 B 2016-11-25 16:35:46 </code></pre> <p>The result I want </p> <pre class="prettyprint"><code>Out[40]: id time 0 A 00:35 1 A 03:12 2 B 00:22 </code></pre> <p>notice: the type of time col is timedelta64[ns]</p> <h3>Trying</h3> <pre class="prettyprint"><code>In [38]: df['time'].diff(1) Out[38]: 0 NaT 1 00:03:47 2 -1 days +23:59:25 3 -1 days +23:59:55 4 00:00:22 Name: time, dtype: timedelta64[ns] </code></pre> <p>Don't get desired result.</p> <h3>Hope</h3> <p>Not only solve the problem but the code can run fast because there are 50 million rows.</p>

<p>You can use <code>sort_values</code> with <code>groupby</code> and aggregating <code>diff</code>:</p> <pre class="prettyprint"><code>df['diff'] = df.sort_values(['id','time']).groupby('id')['time'].diff() print (df) id time diff 0 A 2016-11-25 16:32:17 NaT 1 A 2016-11-25 16:36:04 00:00:35 2 A 2016-11-25 16:35:29 00:03:12 3 B 2016-11-25 16:35:24 NaT 4 B 2016-11-25 16:35:46 00:00:22 </code></pre> <p>If need remove rows with <code>NaT</code> in column <code>diff</code> use <code>dropna</code>:</p> <pre class="prettyprint"><code>df = df.dropna(subset=['diff']) print (df) id time diff 2 A 2016-11-25 16:35:29 00:03:12 1 A 2016-11-25 16:36:04 00:00:35 4 B 2016-11-25 16:35:46 00:00:22 </code></pre> <p>You can also overwrite column:</p> <pre class="prettyprint"><code>df.time = df.sort_values(['id','time']).groupby('id')['time'].diff() print (df) id time 0 A NaT 1 A 00:00:35 2 A 00:03:12 3 B NaT 4 B 00:00:22 </code></pre> <hr> <pre class="prettyprint"><code>df.time = df.sort_values(['id','time']).groupby('id')['time'].diff() df = df.dropna(subset=['time']) print (df) id time 1 A 00:00:35 2 A 00:03:12 4 B 00:00:22 </code></pre>

How to calculate time difference by group using pandas?

Problem

I want to calculate diff by group. And I don’t know how to sort the time column so that each group results are sorted and positive.

The original data :

In [37]: df  Out[37]:   id                time 0  A 2016-11-25 16:32:17 1  A 2016-11-25 16:36:04 2  A 2016-11-25 16:35:29 3  B 2016-11-25 16:35:24 4  B 2016-11-25 16:35:46

The result I want

Out[40]:    id   time 0  A   00:35 1  A   03:12 2  B   00:22

notice: the type of time col is timedelta64[ns]

Trying

In [38]: df['time'].diff(1) Out[38]: 0                 NaT 1            00:03:47 2   -1 days +23:59:25 3   -1 days +23:59:55 4            00:00:22 Name: time, dtype: timedelta64[ns]

Don't get desired result.

Hope

Not only solve the problem but the code can run fast because there are 50 million rows.

310

asked Nov 25 '16 11:11

Jack

1 Answers

You can use sort_values with groupby and aggregating diff:

df['diff'] = df.sort_values(['id','time']).groupby('id')['time'].diff() print (df)   id                time     diff 0  A 2016-11-25 16:32:17      NaT 1  A 2016-11-25 16:36:04 00:00:35 2  A 2016-11-25 16:35:29 00:03:12 3  B 2016-11-25 16:35:24      NaT 4  B 2016-11-25 16:35:46 00:00:22

If need remove rows with NaT in column diff use dropna:

df = df.dropna(subset=['diff']) print (df)   id                time     diff 2  A 2016-11-25 16:35:29 00:03:12 1  A 2016-11-25 16:36:04 00:00:35 4  B 2016-11-25 16:35:46 00:00:22

You can also overwrite column:

df.time = df.sort_values(['id','time']).groupby('id')['time'].diff() print (df)   id     time 0  A      NaT 1  A 00:00:35 2  A 00:03:12 3  B      NaT 4  B 00:00:22

df.time = df.sort_values(['id','time']).groupby('id')['time'].diff() df = df.dropna(subset=['time']) print (df)   id     time 1  A 00:00:35 2  A 00:03:12 4  B 00:00:22

121

answered Sep 17 '22 21:09

jezrael

Related questions
                            
                                PhantomJS with Selenium error: Message: 'phantomjs' executable needs to be in PATH
                            
                                How to perform k-fold cross validation with tensorflow?
                            
                                How to turn a video into numpy array?
                            
                                How to debug a Python module in Visual Studio Code's launch.json
                            
                                How to create a udf in PySpark which returns an array of strings?
                            
                                How to get a single value as a string from pandas data frame
                            
                                Pandas dataframe select rows where a list-column contains any of a list of strings
                            
                                SVG rendering in a PyGame application
                            
                                python sqlalchemy + postgresql program freezes
                            
                                Overlay two same sized images in Python
                            
                                How to unzip file in Python on all OSes?
                            
                                How to capture a video (AND audio) in python, from a camera (or webcam)
                            
                                Disable the underlying window when a popup is created in Python TKinter
                            
                                Pip install python package into a specific directory other than the default install location
                            
                                Grammatical List Join in Python [duplicate]
                            
                                Interpolation over regular grid in Python [closed]
                            
                                setup.py sdist exclude packages in subdirectory
                            
                                Add a parameter into kwargs during function call?
                            
                                Access config values in Flask from other files
                            
                                How to delete pages from pdf file using Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to calculate time difference by group using pandas?

Tags:

python

sorting

pandas

difference

timedelta