df:
id c1 c2 c3
101 a b c
102 b c d
103 d e f
101 h i j
102 k l m
I want to select rows based on grouping on id
column where count > 1
The result should be all rows whose id
had more than 1 entry
Expected result:
df:
id c1 c2 c3
101 a b c
102 b c d
101 h i j
102 k l m
I am able to achieve this with below code I wrote.
g = df.groupby('id').size().reset_index(name='counts')
filt = g.query('counts > 1')
m_filt = df.id.isin (filt.id)
df_filtered= df[m_filt]
Wanted to check if there is a better way of doing this.
You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
To access a group of rows in a Pandas DataFrame, we can use the loc() method. For example, if we use df. loc[2:5], then it will select all the rows from 2 to 5.
Use pandas. DataFrame. head(n) to get the first n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the start).
Use GroupBy.transform
with GroupBy.size
for Series
with same size like original DataFrame
, so possible filter by boolean indexing
:
df[df.groupby('id').transform('size')['id'].gt(1)]
Or if need all duplicated rows use DataFrame.duplicated
with keep=False
:
df[df.duplicated('id', keep=False)]
Or similar:
df[df['id'].duplicated(keep=False)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With