I have a Pandas dataframe <code>df</code> for which I want to find all rows for which the value of column <code>A</code> is the same, but the value of column <code>B</code> different, e.g.: <pre class="prettyprint"><code> | A | B ---|---|--- 0 | 2 | x 1 | 2 | y </code></pre> I know I can use <code>pd.concat(g for _, g in df.groupby('A') if len(g) > 1)</code> to get the rows with duplicate values of <code>A</code>, but how do I add the second constraint?

Thinking about this, it makes sense to call <code>unique</code> on the <code>groupby</code>: <pre class="prettyprint"><code>In [213]: df = pd.DataFrame({'A':2, 'B':list('xxyzz')}) df Out[213]: A B 0 2 x 1 2 x 2 2 y 3 2 z 4 2 z In [229]: df.groupby('A')['B'].apply(lambda x: x.unique()).reset_index() Out[229]: A B 0 2 [x, y, z] </code></pre>

<pre class="prettyprint"><code>df.groupby('A').filter(lambda x: len(x['B'].unique()) > 1) </code></pre>

Pandas search for duplicate rows in one column which have different values in another column

Tags:

python

pandas

I have a Pandas dataframe df for which I want to find all rows for which the value of column A is the same, but the value of column B different, e.g.:

       | A | B
    ---|---|---
     0 | 2 | x 
     1 | 2 | y

I know I can use pd.concat(g for _, g in df.groupby('A') if len(g) > 1) to get the rows with duplicate values of A, but how do I add the second constraint?

831

asked Jan 19 '17 15:01

marianne

2 Answers

Thinking about this, it makes sense to call unique on the groupby:

In [213]:
df = pd.DataFrame({'A':2, 'B':list('xxyzz')})
df

Out[213]:
   A  B
0  2  x
1  2  x
2  2  y
3  2  z
4  2  z

In [229]:
df.groupby('A')['B'].apply(lambda x: x.unique()).reset_index()

Out[229]:
   A          B
0  2  [x, y, z]

136

answered Oct 14 '22 03:10

EdChum

df.groupby('A').filter(lambda x: len(x['B'].unique()) > 1)

answered Oct 14 '22 04:10

Hezi Zisman

Related questions
                            
                                Retrieving result from celery worker constantly
                            
                                python patch with side_effect on Object's method is not called with self
                            
                                flake8: import statements are in the wrong order
                            
                                Most efficient way to determine overlapping timeseries in Python
                            
                                Install Python 3 to /usr/bin/ on macOS
                            
                                Uploading different versions (python 2.7 vs 3.5) to PyPI
                            
                                Why is `'↊'.isnumeric()` false?
                            
                                Can tqdm be used with Database Reads?
                            
                                python matplotlib bar chart adding bar titles
                            
                                Python multiprocessing - Capturing signals to restart child processes or shut down parent process
                            
                                Does Python's MRO, C3 linearization work depth-first? Empirically it does not
                            
                                Generalizing adding nested lists
                            
                                Using Datetimes with Seaborn's Regplot
                            
                                Django Rest Framework with multiple Viewsets and Routers for the same object
                            
                                Sklearn : Mean Distance from Centroid of each cluster
                            
                                Why does celery.control.inspect report fewer queued tasks than rabbitmqctl?
                            
                                Pandas read_excel() with multiple sheets and specific columns
                            
                                Getting 'ValueError: shapes not aligned' on SciKit Linear Regression
                            
                                Is there a clean test for a dictionary element in Python?
                            
                                Do peewee models automatically close the connection?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With