pandas - get most recent value of a particular column indexed by another column (get maximum value of a particular column indexed by another column)

Tags:

I have the following dataframe:

   obj_id   data_date   value 0  4        2011-11-01  59500     1  2        2011-10-01  35200  2  4        2010-07-31  24860    3  1        2009-07-28  15860 4  2        2008-10-15  200200

I want to get a subset of this data so that I only have the most recent (largest 'data_date') 'value' for each 'obj_id'.

I've hacked together a solution, but it feels dirty. I was wondering if anyone has a better way. I'm sure I must be missing some easy way to do it through pandas.

My method is essentially to group, sort, retrieve, and recombine as follows:

row_arr = [] for grp, grp_df in df.groupby('obj_id'):     row_arr.append(dfg.sort('data_date', ascending = False)[:1].values[0])  df_new = DataFrame(row_arr, columns = ('obj_id', 'data_date', 'value'))

555

asked Mar 24 '12 10:03

enrishi

2 Answers

If the number of "obj_id"s is very high you'll want to sort the entire dataframe and then drop duplicates to get the last element.

sorted = df.sort_index(by='data_date') result = sorted.drop_duplicates('obj_id', keep='last').values

This should be faster (sorry I didn't test it) because you don't have to do a custom agg function, which is slow when there is a large number of keys. You might think it's worse to sort the entire dataframe, but in practice in python sorts are fast and native loops are slow.

answered Oct 08 '22 15:10

thetainted1

This is another possible solution. Dont know if this is the fastest (I doubt..) since I have not benchmarked it against other approaches.

df.loc[df.groupby('obj_id').data_date.idxmax(),:]

answered Oct 08 '22 14:10

pdifranc

Related questions
                            
                                Best way to implement sort asc or desc in rails
                            
                                How come INC instruction of x86 is not atomic? [duplicate]
                            
                                Replace a fragment programmatically
                            
                                Using superclass to initialise a subclass object java [duplicate]
                            
                                How to arrange many <div> elements side by side with no wrap [duplicate]
                            
                                Threejs: assign different colors to each vertex in a geometry
                            
                                Java threads and garbage collector [duplicate]
                            
                                Pass parameters from bootstrapper to msi bundle package
                            
                                Is there any way to compile additional code at runtime in C or C++?
                            
                                Getting ServiceStack to retain type information
                            
                                Add event handler to an element that not yet exists using on()?
                            
                                Can EF automatically delete data that is orphaned, where the parent is not deleted?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With