I have a table like this:
CustID Purchase Time
A Item1 01/01/2011
B Item2 01/01/2011
C Item1 01/02/2011
A Item2 03/01/2011
I would like to select rows with CustID appear more than 1 in the table.
This could work:
counts = df['CustID'].value_counts()
df[df['CustID'].isin(counts.index[counts > 1])]
Result:
CustID Purchase Time
0 A Item1 01/01/2011
3 A Item2 03/01/2011
df[df['CustID'].duplicated(keep=False)]
This finds the rows in the data frame where there exist duplicates in the CustID
column. The keep=False
tells the duplicated
function to mark all duplicate rows as True
(as opposed to just the first or last ones):
CustID Purchase Time
0 A Item1 01/01/2011
3 A Item2 03/01/2011
EDIT
Looking at the docs for duplicated
it looks like you can also do:
df[df.duplicated('CustID', keep=False)]
Though this seems to be about 100 µs slower than the original (458 µs vs. 545 µs based on the example dataframe)
Use filter
df.groupby('CustID').filter(lambda x: len(x) > 1)
CustID Purchase Time
0 A Item1 01/01/2011
3 A Item2 03/01/2011
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With