I have a df
id    val1    val2  1     1.1      2.2  1     1.1      2.2  2     2.1      5.5  3     8.8      6.2  4     1.1      2.2  5     8.8      6.2  I want to group by val1 and val2 and get similar dataframe only with rows which has multiple occurance of same val1 and val2 combination.
Final df:
id    val1    val2  1     1.1      2.2  4     1.1      2.2  3     8.8      6.2  5     8.8      6.2 
                Find Duplicate Rows based on all columns To find & select the duplicate all rows based on all columns call the Daraframe. duplicate() without any subset argument. It will return a Boolean series with True at the place of each duplicated rows except their first occurrence (default value of keep argument is 'first').
The pandas. DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.
Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.
You need duplicated with parameter subset for specify columns for check with keep=False for all duplicates for mask and filter by boolean indexing:
df = df[df.duplicated(subset=['val1','val2'], keep=False)] print (df)    id  val1  val2 0   1   1.1   2.2 1   1   1.1   2.2 3   3   8.8   6.2 4   4   1.1   2.2 5   5   8.8   6.2   Detail:
print (df.duplicated(subset=['val1','val2'], keep=False)) 0     True 1     True 2    False 3     True 4     True 5     True dtype: bool 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With