Drop duplicates, keep most recent date, Pandas dataframe

Tags:

I have a Pandas dataframe containing two columns: a datetime column, and a column of integers representing station IDs. I need a new dataframe with the following modifications:

For each set of duplicate STATION_ID values, keep the row with the most recent entry for DATE_CHANGED. If the duplicate entries for the STATION_ID all contain the same DATE_CHANGED then drop the duplicates and retain a single row for the STATION_ID. If there are no duplicates for the STATION_ID value, simply retain the row.

Dataframe (sorted by STATION_ID):

Click to copy

              DATE_CHANGED  STATION_ID 0      2006-06-07 06:00:00           1 1      2000-09-26 06:00:00           1 2      2000-09-26 06:00:00           1 3      2000-09-26 06:00:00           1 4      2001-06-06 06:00:00           2 5      2005-07-29 06:00:00           2 6      2005-07-29 06:00:00           2 7      2001-06-06 06:00:00           2 8      2001-06-08 06:00:00           4 9      2003-11-25 07:00:00           4 10     2001-06-12 06:00:00           7 11     2001-06-04 06:00:00           8 12     2017-04-03 18:36:16           8 13     2017-04-03 18:36:16           8 14     2017-04-03 18:36:16           8 15     2001-06-04 06:00:00           8 16     2001-06-08 06:00:00          10 17     2001-06-08 06:00:00          10 18     2001-06-08 06:00:00          11 19     2001-06-08 06:00:00          11 20     2001-06-08 06:00:00          12 21     2001-06-08 06:00:00          12 22     2001-06-08 06:00:00          13 23     2001-06-08 06:00:00          13 24     2001-06-08 06:00:00          14 25     2001-06-08 06:00:00          14 26     2001-06-08 06:00:00          15 27     2017-08-07 17:48:25          15 28     2001-06-08 06:00:00          15 29     2017-08-07 17:48:25          15 ...                    ...         ... 157066 2018-08-06 14:11:28       71655 157067 2018-08-06 14:11:28       71656 157068 2018-08-06 14:11:28       71656 157069 2018-09-11 21:45:05       71664 157070 2018-09-11 21:45:05       71664 157071 2018-09-11 21:45:05       71664 157072 2018-09-11 21:41:04       71664 157073 2018-08-09 15:22:07       71720 157074 2018-08-09 15:22:07       71720 157075 2018-08-09 15:22:07       71720 157076 2018-08-23 12:43:12       71899 157077 2018-08-23 12:43:12       71899 157078 2018-08-23 12:43:12       71899 157079 2018-09-08 20:21:43       71969 157080 2018-09-08 20:21:43       71969 157081 2018-09-08 20:21:43       71969 157082 2018-09-08 20:21:43       71984 157083 2018-09-08 20:21:43       71984 157084 2018-09-08 20:21:43       71984 157085 2018-09-05 18:46:18       71985 157086 2018-09-05 18:46:18       71985 157087 2018-09-05 18:46:18       71985 157088 2018-09-08 20:21:44       71990 157089 2018-09-08 20:21:44       71990 157090 2018-09-08 20:21:44       71990 157091 2018-09-08 20:21:43       72003 157092 2018-09-08 20:21:43       72003 157093 2018-09-08 20:21:43       72003 157094 2018-09-10 17:06:18       72024 157095 2018-09-10 17:15:05       72024  [157096 rows x 2 columns]

DATE_CHANGED is dtype: datetime64[ns]

STATION_ID is dtype: int64

pandas==0.23.4

python==2.7.15

232

asked Sep 18 '18 23:09

PJW

1 Answers

Try:

Click to copy

df.sort_values('DATE_CHANGED').drop_duplicates('STATION_ID',keep='last')

103

answered Sep 19 '22 13:09

sacuL

Related questions
                            
                                Floating action button in andoidx library
                            
                                Get "y" position of container on Flutter
                            
                                Deploying GitLab pages for different branches
                            
                                Is it possible to create a new data type in JavaScript?
                            
                                Nodejs, TypeScript, ts-node-dev & top-level await
                            
                                Does Django have HTML helpers?
                            
                                Short Integers in Python
                            
                                Replace an item in a list in Common Lisp?
                            
                                Java: convert a char[] to a CharSequence
                            
                                How to make a call to my WCF service asynchronous?
                            
                                Equivalent of double-clickable .sh and .bat on Mac?
                            
                                TortoiseSVN not asking for authentication?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Drop duplicates, keep most recent date, Pandas dataframe

Tags:

PJW

People also ask

1 Answers

sacuL

Recent Activity

Donate For Us