I have a df
id val1 val2 1 1.1 2.2 1 1.1 2.2 2 2.1 5.5 3 8.8 6.2 4 1.1 2.2 5 8.8 6.2
I want to group by val1 and val2
and get similar dataframe only with rows which has multiple occurance of same val1 and val2
combination.
Final df
:
id val1 val2 1 1.1 2.2 4 1.1 2.2 3 8.8 6.2 5 8.8 6.2
Find Duplicate Rows based on all columns To find & select the duplicate all rows based on all columns call the Daraframe. duplicate() without any subset argument. It will return a Boolean series with True at the place of each duplicated rows except their first occurrence (default value of keep argument is 'first').
The pandas. DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.
Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.
You need duplicated
with parameter subset
for specify columns for check with keep=False
for all duplicates for mask and filter by boolean indexing
:
df = df[df.duplicated(subset=['val1','val2'], keep=False)] print (df) id val1 val2 0 1 1.1 2.2 1 1 1.1 2.2 3 3 8.8 6.2 4 4 1.1 2.2 5 5 8.8 6.2
Detail:
print (df.duplicated(subset=['val1','val2'], keep=False)) 0 True 1 True 2 False 3 True 4 True 5 True dtype: bool
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With