pandas rolling max with groupby

Tags:

I have a problem getting the rolling function of Pandas to do what I wish. I want for each frow to calculate the maximum so far within the group. Here is an example:

df = pd.DataFrame([[1,3], [1,6], [1,3], [2,2], [2,1]], columns=['id', 'value'])

looks like

   id  value
0   1      3
1   1      6
2   1      3
3   2      2
4   2      1

Now I wish to obtain the following DataFrame:

   id  value
0   1      3
1   1      6
2   1      6
3   2      2
4   2      2

The problem is that when I do

df.groupby('id')['value'].rolling(1).max()

I get the same DataFrame back. And when I do

df.groupby('id')['value'].rolling(3).max()

I get a DataFrame with Nans. Can someone explain how to properly use rolling or some other Pandas function to obtain the DataFrame I want?

629

asked May 07 '17 10:05

splinter

2 Answers

It looks like you need cummax() instead of .rolling(N).max()

In [29]: df['new'] = df.groupby('id').value.cummax()

In [30]: df
Out[30]:
   id  value  new
0   1      3    3
1   1      6    6
2   1      3    6
3   2      2    2
4   2      1    2

Timing (using brand new Pandas version 0.20.1):

In [3]: df = pd.concat([df] * 10**4, ignore_index=True)

In [4]: df.shape
Out[4]: (50000, 2)

In [5]: %timeit df.groupby('id').value.apply(lambda x: x.cummax())
100 loops, best of 3: 15.8 ms per loop

In [6]: %timeit df.groupby('id').value.cummax()
100 loops, best of 3: 4.09 ms per loop

NOTE: from Pandas 0.20.0 what's new

Improved performance of groupby().cummin() and groupby().cummax() (GH15048, GH15109, GH15561, GH15635)

152

answered Sep 25 '22 03:09

MaxU - stop WAR against UA

Using apply will be a tiny bit faster:

# Using apply  
df['output'] = df.groupby('id').value.apply(lambda x: x.cummax())
%timeit df['output'] = df.groupby('id').value.apply(lambda x: x.cummax())
1000 loops, best of 3: 1.57 ms per loop

Other method:

df['output'] = df.groupby('id').value.cummax()
%timeit df['output'] = df.groupby('id').value.cummax()
1000 loops, best of 3: 1.66 ms per loop

answered Sep 26 '22 03:09

Andrew L

Related questions
                            
                                I cannot close Excel 2016 after executing a xlwings function
                            
                                Why is np.where faster than pd.apply
                            
                                Reshape arbitrary length vector into square matrix with padding in numpy
                            
                                Pyodbc installation error on Ubuntu 16.04 with Sql Server installed
                            
                                Python equivalent to R poly() function?
                            
                                Python - Find line number from text file [closed]
                            
                                Python - Calculate Hierarchical clustering of word2vec vectors and plot the results as a dendrogram
                            
                                Speeding up Pandas to_sql()?
                            
                                python pycparser setup error
                            
                                How to use __init__.py in (sub-)modules to define namespaces?
                            
                                Python Bokeh send additional parameters to widget event handler
                            
                                python regex to replace all single word characters in string
                            
                                SQLAlchemy AttributeError: 'Query' object has no attribute '_sa_instance_state' when retrieving from database
                            
                                gensim word2vec - array dimensions in updating with online word embedding
                            
                                Retrieve attribute names and values with Python / lxml and XPath
                            
                                Why does random sampling scale with the dataset not the sample size? (pandas .sample() example)
                            
                                Keras VGG16 fine tuning
                            
                                Speed of SVM Kernels? Linear vs RBF vs Poly
                            
                                How do I install a package for different Python versions in Anaconda?
                            
                                Declaring a number in Python. Possible to emphasize thousand?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas rolling max with groupby

Tags:

python

python-3.x

pandas

dataframe

group-by

splinter

People also ask

2 Answers

MaxU - stop WAR against UA

Andrew L

Recent Activity

Donate For Us