I have a dataframe with monthly records for different IDs, and I need to do some analysis only on the IDs that have multiple months of records.
ID Month Metric1 Metric2
1 2018-01-01 4 3
1 2018-02-01 3 2
2 2018-02-01 1 5
3 2018-01-01 4 2
3 2018-02-01 6 3
4 2018-01-01 3 1
How would I filter out the rows of ID that only appear once and keep those with multiple rows and get a result like
ID Month Metric1 Metric2
1 2018-01-01 4 3
1 2018-02-01 3 2
3 2018-01-01 4 2
3 2018-02-01 6 3
I've looked at some other pages that mention using something like:
df = df[df.groupby('ID').ID.transform(len) > 1]
But I don't want to lose the metrics from each month by grouping.
Change the len
to count
df[df.groupby('ID').ID.transform('count') > 1]
Out[589]:
ID Month Metric1 Metric2
0 1 2018-01-01 4 3
1 1 2018-02-01 3 2
3 3 2018-01-01 4 2
4 3 2018-02-01 6 3
Try with pd.series.duplicated()
:
df1=df[df.ID.duplicated(keep=False)]
print(df1)
ID Month Metric1 Metric2
0 1 2018-01-01 4 3
1 1 2018-02-01 3 2
3 3 2018-01-01 4 2
4 3 2018-02-01 6 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With