Pandas select rows if ID appear several time

Question

I have a table like this:

CustID  Purchase  Time
A       Item1     01/01/2011
B       Item2     01/01/2011   
C       Item1     01/02/2011   
A       Item2     03/01/2011

I would like to select rows with CustID appear more than 1 in the table.

languitar · Accepted Answer

This could work:

counts = df['CustID'].value_counts()
df[df['CustID'].isin(counts.index[counts > 1])]

Result:

  CustID Purchase        Time
0      A    Item1  01/01/2011
3      A    Item2  03/01/2011

bunji · Answer

df[df['CustID'].duplicated(keep=False)]

This finds the rows in the data frame where there exist duplicates in the CustID column. The keep=False tells the duplicated function to mark all duplicate rows as True (as opposed to just the first or last ones):

  CustID Purchase        Time
0      A    Item1  01/01/2011
3      A    Item2  03/01/2011

EDIT

Looking at the docs for duplicated it looks like you can also do:

df[df.duplicated('CustID', keep=False)]

Though this seems to be about 100 µs slower than the original (458 µs vs. 545 µs based on the example dataframe)

piRSquared · Answer

Use filter

df.groupby('CustID').filter(lambda x: len(x) > 1)

  CustID Purchase        Time
0      A    Item1  01/01/2011
3      A    Item2  03/01/2011

Pandas select rows if ID appear several time

Tags:

python

pandas

dataframe

group-by

pandas-groupby

Hai Vu

3 Answers

languitar

bunji

piRSquared

Recent Activity

Donate For Us

Pandas select rows if ID appear several time

Tags:

python

pandas

dataframe

group-by

pandas-groupby

Hai Vu

3 Answers

languitar

bunji

piRSquared

Related questions

Recent Activity

Donate For Us