This is rather simple but I can't get me head around it. Let's say for the following data frame, I want to keep only the rows with duplicated values in column y: <pre class="prettyprint"><code>>>> df x y x y 0 1 1 1 2 2 2 3 2 3 4 3 4 5 3 5 6 3 6 7 5 7 8 2 </code></pre> The desired output looks like: <pre class="prettyprint"><code>>>> df x y 1 2 2 2 3 2 3 4 3 4 5 3 5 6 3 7 8 2 </code></pre> I tried this: <pre class="prettyprint"><code>df[~df.duplicated('y')] </code></pre> but I get this: <pre class="prettyprint"><code> x y 0 1 1 1 2 2 3 4 3 6 7 5 </code></pre>

Docs: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html <blockquote> keep : {‘first’, ‘last’, False}, default ‘first’ <ul> <li>first : Mark duplicates as True except for the first occurrence. </li> <li>last : Mark duplicates as True except for the last occurrence. </li> <li>False : Mark all duplicates as True.</li> </ul> </blockquote> Meaning you are looking for: <pre class="prettyprint"><code>df[df.duplicated('y',keep=False)] </code></pre> Output: <pre class="prettyprint"><code> x y 1 2 2 2 3 2 3 4 3 4 5 3 5 6 3 7 8 2 </code></pre>

Remove non-duplicated rows from pandas

Tags:

python

pandas

This is rather simple but I can't get me head around it. Let's say for the following data frame, I want to keep only the rows with duplicated values in column y:

The desired output looks like:

I tried this:

df[~df.duplicated('y')]

but I get this:

273

asked Aug 05 '17 22:08

mallet

1 Answers

Docs: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html

keep : {‘first’, ‘last’, False}, default ‘first’

first : Mark duplicates as True except for the first occurrence.

last : Mark duplicates as True except for the last occurrence.

False : Mark all duplicates as True.

Meaning you are looking for:

df[df.duplicated('y',keep=False)]

Output:

132

answered Nov 15 '22 08:11

Anton vBR

Related questions
                            
                                Omit joining lines in matplotlib plot e.g. y = tan(x)
                            
                                how to get the current row index with Openpyxl
                            
                                Tensorflow: Using neural network to classify positive or negative phrases
                            
                                Previous month datetime pandas
                            
                                Python Seaborn - How are outliers determined in boxplots
                            
                                Using fillna method on multiple columns of a Pandas DataFrame failed
                            
                                How to download pdf files using Python?
                            
                                Matrix norm in TensorFlow
                            
                                Call built-in function if overwritten by a variable of the same name
                            
                                Python Pandas: Get dataframe.value_counts() result as list
                            
                                Why can't I call a method from my Python class?
                            
                                How to plot additional points on the top of scatter plot?
                            
                                mypy error: List or tuple literal expected as the second argument to namedtuple()
                            
                                Set Polygon Colors Matplotlib
                            
                                Is there a way to read Stata labels in python?
                            
                                Summing up more than two dataframes with the same indexes in Pandas
                            
                                NumPy sort function returns None
                            
                                ImportError: No module named 'matplotlib.externals'
                            
                                Unpack a List in to Indices of another list in python
                            
                                What is this error in Python tabula module?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With