Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas select rows if ID appear several time

I have a table like this:

CustID  Purchase  Time
A       Item1     01/01/2011
B       Item2     01/01/2011   
C       Item1     01/02/2011   
A       Item2     03/01/2011   

I would like to select rows with CustID appear more than 1 in the table.

like image 465
Hai Vu Avatar asked Apr 11 '17 14:04

Hai Vu


3 Answers

This could work:

counts = df['CustID'].value_counts()
df[df['CustID'].isin(counts.index[counts > 1])]

Result:

  CustID Purchase        Time
0      A    Item1  01/01/2011
3      A    Item2  03/01/2011
like image 106
languitar Avatar answered Oct 21 '22 12:10

languitar


df[df['CustID'].duplicated(keep=False)]

This finds the rows in the data frame where there exist duplicates in the CustID column. The keep=False tells the duplicated function to mark all duplicate rows as True (as opposed to just the first or last ones):

  CustID Purchase        Time
0      A    Item1  01/01/2011
3      A    Item2  03/01/2011

EDIT

Looking at the docs for duplicated it looks like you can also do:

df[df.duplicated('CustID', keep=False)]

Though this seems to be about 100 µs slower than the original (458 µs vs. 545 µs based on the example dataframe)

like image 41
bunji Avatar answered Oct 21 '22 10:10

bunji


Use filter

df.groupby('CustID').filter(lambda x: len(x) > 1)
  CustID Purchase        Time
0      A    Item1  01/01/2011
3      A    Item2  03/01/2011
like image 45
piRSquared Avatar answered Oct 21 '22 11:10

piRSquared