I have data similar to: <pre class="prettyprint"><code>id value duplicate a 200 yes a 12 yes b 42 yes c 12 no b 532 yes b 21 yes ... </code></pre> To track the duplicates I use <code>df['duplicate'] = df.duplicated('id', keep=False)</code> However, I would like to keep the ones with the highest <code>value</code> and either mark or drop the other duplicates. Any suggestions?

Ah I don't know why I didn't think of this first. <code>df.sort(['id', 'value']) df['is_duplicated'] = df.duplicated('id', keep='first')</code> sorry!

Pandas keep duplicated with highest value

Tags:

python-2.7

I have data similar to:

id value duplicate
a   200  yes
a   12   yes
b   42   yes
c   12   no
b   532  yes
b   21   yes
...

To track the duplicates I use df['duplicate'] = df.duplicated('id', keep=False) However, I would like to keep the ones with the highest value and either mark or drop the other duplicates. Any suggestions?

824

asked Oct 29 '15 20:10

As3adTintin

1 Answers

Ah I don't know why I didn't think of this first. df.sort(['id', 'value']) df['is_duplicated'] = df.duplicated('id', keep='first')

sorry!

answered Sep 18 '22 01:09

As3adTintin

Related questions
                            
                                How do I find and remove white specks from an image using SciPy/NumPy?
                            
                                python 3.4.2 urlib no attribute 'pathname2url'
                            
                                how to install QtSvg,QtWebKit,QtWebKitWidgets(all in Qt5 version) on ubuntu 14.04?
                            
                                mandrill template variables not substituting
                            
                                <__main__. object at 0x02C08790>
                            
                                Python3 - Sympy: expand products of trig functions
                            
                                python psutil psutil.get_process_list() error
                            
                                How to serialize sympy lambdified function?
                            
                                How can this code print Hello World without any print statement
                            
                                Better way to create block matrices out of individual blocks in numpy?
                            
                                How can I duplicate an xml element using Python?
                            
                                Current way to get Home URL (Domain) in Django Template?
                            
                                Scipy, differential evolution
                            
                                How to keep first two duplicates in a pandas dataframe?
                            
                                In Odoo 8 ORM api , how to get results in reverse order using search()?
                            
                                Plotting histogram using seaborn for a dataframe
                            
                                How to edit Sidebar under Sphinx Alabaster theme
                            
                                pyyaml safe_load: how to ignore local tags
                            
                                Why is it not possible to convert "1.7" to integer directly, without converting to float first?
                            
                                How to run python file with admin rights in pycharm

Pandas keep duplicated with highest value

Tags:

python

pandas

duplicates

python-2.7

As3adTintin

People also ask

1 Answers

As3adTintin

Recent Activity

Donate For Us