In pandas, given a DataFrame D: <pre class="prettyprint"><code>+-----+--------+--------+--------+ | | 1 | 2 | 3 | +-----+--------+--------+--------+ | 0 | apple | banana | banana | | 1 | orange | orange | orange | | 2 | banana | apple | orange | | 3 | NaN | NaN | NaN | | 4 | apple | apple | apple | +-----+--------+--------+--------+ </code></pre> How do I return rows that have the same contents across all of its columns when there are three columns or more such that it returns this: <pre class="prettyprint"><code>+-----+--------+--------+--------+ | | 1 | 2 | 3 | +-----+--------+--------+--------+ | 1 | orange | orange | orange | | 4 | apple | apple | apple | +-----+--------+--------+--------+ </code></pre> Note that it skips rows when all values are NaN. If this were only two columns, I usually do <code>D[D[1]==D[2]]</code> but I don't know how to generalize this for more than 2 column DataFrames.

Similar to Andy Hayden answer with check if min equal to max (then row elements are all duplicates): <pre class="prettyprint"><code>df[df.apply(lambda x: min(x) == max(x), 1)] </code></pre>

My entry: <pre class="prettyprint"><code>>>> df 0 1 2 0 apple banana banana 1 orange orange orange 2 banana apple orange 3 NaN NaN NaN 4 apple apple apple [5 rows x 3 columns] >>> df[df.apply(pd.Series.nunique, axis=1) == 1] 0 1 2 1 orange orange orange 4 apple apple apple [2 rows x 3 columns] </code></pre> This works because calling <code>pd.Series.nunique</code> on the rows gives: <pre class="prettyprint"><code>>>> df.apply(pd.Series.nunique, axis=1) 0 2 1 1 2 3 3 0 4 1 dtype: int64 </code></pre> Note: this would, however, keep rows which look like <code>[nan, nan, apple]</code> or <code>[nan, apple, apple]</code>. Usually I want that, but that might be the wrong answer for your use case.

Get rows that have the same value across its columns in pandas

Tags:

python

pandas

dataframe

In pandas, given a DataFrame D:

+-----+--------+--------+--------+   
|     |    1   |    2   |    3   |
+-----+--------+--------+--------+
|  0  | apple  | banana | banana |
|  1  | orange | orange | orange |
|  2  | banana | apple  | orange |
|  3  | NaN    | NaN    | NaN    |
|  4  | apple  | apple  | apple  |
+-----+--------+--------+--------+

How do I return rows that have the same contents across all of its columns when there are three columns or more such that it returns this:

+-----+--------+--------+--------+   
|     |    1   |    2   |    3   |
+-----+--------+--------+--------+
|  1  | orange | orange | orange |
|  4  | apple  | apple  | apple  |
+-----+--------+--------+--------+

Note that it skips rows when all values are NaN.

If this were only two columns, I usually do D[D[1]==D[2]] but I don't know how to generalize this for more than 2 column DataFrames.

239

asked Jan 20 '14 10:01

kentwait

2 Answers

Similar to Andy Hayden answer with check if min equal to max (then row elements are all duplicates):

df[df.apply(lambda x: min(x) == max(x), 1)]

193

answered Oct 13 '22 00:10

lowtech

My entry:

>>> df
        0       1       2
0   apple  banana  banana
1  orange  orange  orange
2  banana   apple  orange
3     NaN     NaN     NaN
4   apple   apple   apple

[5 rows x 3 columns]
>>> df[df.apply(pd.Series.nunique, axis=1) == 1]
        0       1       2
1  orange  orange  orange
4   apple   apple   apple

[2 rows x 3 columns]

This works because calling pd.Series.nunique on the rows gives:

>>> df.apply(pd.Series.nunique, axis=1)
0    2
1    1
2    3
3    0
4    1
dtype: int64

Note: this would, however, keep rows which look like [nan, nan, apple] or [nan, apple, apple]. Usually I want that, but that might be the wrong answer for your use case.

answered Oct 12 '22 23:10

DSM

Related questions
                            
                                recursive lambda-expressions possible?
                            
                                Eclipse+PyDev+GAE memcache "Undefined variable from import: get"
                            
                                Resident Set Size (RSS) limit has no effect
                            
                                howto uncompress gzipped data in a byte array?
                            
                                Relative imports in python 2.5
                            
                                Login to website using python
                            
                                Convert numbers to grades in python list
                            
                                Python - dealing with mixed-encoding files
                            
                                Python: two-curve gaussian fitting with non-linear least-squares
                            
                                Solving Puzzle in Python
                            
                                Running command lines within your Python script
                            
                                OpenCV 2.4.1 - computing SURF descriptors in Python
                            
                                Is there a C/C++ API for python pandas? [closed]
                            
                                SQLAlchemy introspect column type with inheritance
                            
                                Apply function to pandas DataFrame that can return multiple rows
                            
                                Multiple legends in matplotlib in for loop
                            
                                Calling a function upon button press
                            
                                Pandas data frame from dictionary
                            
                                sys.stdin.readline() reads without prompt, returning 'nothing in between'
                            
                                broken easy_install and pip after upgrading to OS X Mavericks

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With