Given a dataframe: <pre class="prettyprint"><code>>>> import pandas as pd >>> lol = [['a', 1, 1], ['b', 1, 2], ['c', 1, 4], ['c', 2, 9], ['b', 2, 10], ['x', 2, 5], ['d', 2, 3], ['e', 3, 5], ['d', 2, 10], ['a', 3, 5]] >>> df = pd.DataFrame(lol) >>> df.rename(columns={0:'value', 1:'key', 2:'something'}) value key something 0 a 1 1 1 b 1 2 2 c 1 4 3 c 2 9 4 b 2 10 5 x 2 5 6 d 2 3 7 e 3 5 8 d 2 10 9 a 3 5 </code></pre> The goal is to keep the last N rows for the unique values of the <code>key</code> column. If <code>N=1</code>, I could simply use the <code>.drop_duplicates()</code> function as such: <pre class="prettyprint"><code>>>> df.drop_duplicates(subset='key', keep='last') value key something 2 c 1 4 8 d 2 10 9 a 3 5 </code></pre> How do I keep the last 3 rows for each unique values of <code>key</code>? <hr> I could try this for <code>N=3</code>: <pre class="prettyprint"><code>>>> from itertools import chain >>> unique_keys = {k:[] for k in df['key']} >>> for idx, row in df.iterrows(): ... k = row['key'] ... unique_keys[k].append(list(row)) ... >>> >>> df = pd.DataFrame(list(chain(*[v[-3:] for k,v in unique_keys.items()]))) >>> df.rename(columns={0:'value', 1:'key', 2:'something'}) value key something 0 a 1 1 1 b 1 2 2 c 1 4 3 x 2 5 4 d 2 3 5 d 2 10 6 e 3 5 7 a 3 5 </code></pre> But there must be a better way...

Is this what you want ? <pre class="prettyprint"><code>df.groupby('key').tail(3) Out[127]: value key something 0 a 1 1 1 b 1 2 2 c 1 4 5 x 2 5 6 d 2 3 7 e 3 5 8 d 2 10 9 a 3 5 </code></pre>

Keeping the last N duplicates in pandas

Tags:

python

pandas

dataframe

drop-duplicates

Given a dataframe:

>>> import pandas as pd
>>> lol = [['a', 1, 1], ['b', 1, 2], ['c', 1, 4], ['c', 2, 9], ['b', 2, 10], ['x', 2, 5], ['d', 2, 3], ['e', 3, 5], ['d', 2, 10], ['a', 3, 5]]
>>> df = pd.DataFrame(lol)

>>> df.rename(columns={0:'value', 1:'key', 2:'something'})
  value  key  something
0     a    1          1
1     b    1          2
2     c    1          4
3     c    2          9
4     b    2         10
5     x    2          5
6     d    2          3
7     e    3          5
8     d    2         10
9     a    3          5

The goal is to keep the last N rows for the unique values of the key column.

If N=1, I could simply use the .drop_duplicates() function as such:

>>> df.drop_duplicates(subset='key', keep='last')
  value  key  something
2     c    1          4
8     d    2         10
9     a    3          5

How do I keep the last 3 rows for each unique values of key?

I could try this for N=3:

>>> from itertools import chain
>>> unique_keys = {k:[] for k in df['key']}
>>> for idx, row in df.iterrows():
...     k = row['key']
...     unique_keys[k].append(list(row))
... 
>>>
>>> df = pd.DataFrame(list(chain(*[v[-3:] for k,v in unique_keys.items()])))
>>> df.rename(columns={0:'value', 1:'key', 2:'something'})
  value  key  something
0     a    1          1
1     b    1          2
2     c    1          4
3     x    2          5
4     d    2          3
5     d    2         10
6     e    3          5
7     a    3          5

But there must be a better way...

990

asked Oct 17 '17 01:10

alvas

2 Answers

Is this what you want ?

df.groupby('key').tail(3)
Out[127]: 
  value  key  something
0     a    1          1
1     b    1          2
2     c    1          4
5     x    2          5
6     d    2          3
7     e    3          5
8     d    2         10
9     a    3          5

185

answered Oct 20 '22 19:10

BENY

Does this help:

for k,v in df.groupby('key'):
    print v[-2:]

  value  key  something
1     b    1          2
2     c    1          4
  value  key  something
6     d    2          3
8     d    2         10
  value  key  something
7     e    3          5
9     a    3          5

answered Oct 20 '22 18:10

Merlin

Related questions
                            
                                Numpy is calculating wrong [duplicate]
                            
                                How to set labels in matplotlib.hlines
                            
                                Getting the difference (in values) between two dictionaries in python
                            
                                Login Wordpress with requests - Python3
                            
                                feature_names must be unique - Xgboost
                            
                                Convert csv to JSON tree structure?
                            
                                'numpy.ndarray' object has no attribute 'imshow'
                            
                                rgb to yuv conversion and accessing Y, U and V channels
                            
                                ANOVA for groups within a dataframe using scipy
                            
                                Byte code of a compiled script differs based on how it was compiled [duplicate]
                            
                                Python class methods: when is self not needed
                            
                                Check for valid domain name in a string?
                            
                                Popping first element from a Python tuple
                            
                                How can I get sign bit of an integer in python?
                            
                                How to include the function name into logging
                            
                                all permutations of +-r, +-s
                            
                                Easy parallelization of numpy.apply_along_axis()?
                            
                                Tensorflow: ValueError: Can't load save_path when it is None in single shot detection
                            
                                How do you declare python variables within flask templates?
                            
                                Compose dynamic SQL string with psycopg2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With