I'm trying out pandas for the first time. I have a dataframe with two columns: user_id
and string
. Each user_id may have several strings, thus showing up in the dataframe multiple times. I want to derive another dataframe from this; one where only those user_ids
are listed that have at least 2 or more strings
associated to them.
I tried df[df['user_id'].value_counts()> 1]
, which I thought was the standard way to do this, but it yields IndexingError: Unalignable boolean Series key provided
. Can someone clear out my concept and provide the correct alternative?
I think you need transform
, because need same index
of mask as df
. But if use value_counts
index
is changed and it raise error.
df[df.groupby('user_id')['user_id'].transform('size') > 1]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With