Filtering dataframe based on column value_counts (pandas)

Question

I'm trying out pandas for the first time. I have a dataframe with two columns: user_id and string. Each user_id may have several strings, thus showing up in the dataframe multiple times. I want to derive another dataframe from this; one where only those user_ids are listed that have at least 2 or more strings associated to them.

I tried df[df['user_id'].value_counts()> 1], which I thought was the standard way to do this, but it yields IndexingError: Unalignable boolean Series key provided. Can someone clear out my concept and provide the correct alternative?

jezrael · Accepted Answer

I think you need transform, because need same index of mask as df. But if use value_counts index is changed and it raise error.

df[df.groupby('user_id')['user_id'].transform('size') > 1]

Filtering dataframe based on column value_counts (pandas)

Tags:

python

pandas

Hassan Baig

1 Answers

jezrael

Recent Activity

Donate For Us

Filtering dataframe based on column value_counts (pandas)

Tags:

python

pandas

Hassan Baig

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us