I have a DataFrame with three columns Date
, Advertiser
and ID. I grouped the data firsts to see if volumns of some Advertisers are too small (For example when count()
less than 500). And then I want to drop those rows in the group table.
df.groupby(['Date','Advertiser']).ID.count()
The result likes this:
Date Advertiser
2016-01 A 50000
B 50
C 4000
D 24000
2016-02 A 6800
B 7800
C 123
2016-03 B 1111
E 8600
F 500
I want a result to be this:
Date Advertiser
2016-01 A 50000
C 4000
D 24000
2016-02 A 6800
B 7800
2016-03 B 1111
E 8600
Followed up question:
How about if I want to filter out the rows in groupby in term of the total count()
in date category. For example, I want to count()
for a date larger than 15000. The table I want likes this:
Date Advertiser
2016-01 A 50000
B 50
C 4000
D 24000
2016-02 A 6800
B 7800
C 123
Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).
Use drop() method to delete rows based on column value in pandas DataFrame, as part of the data cleansing, you would be required to drop rows from the DataFrame when a column value matches with a static value or on another column value.
To drop a specific row from the data frame – specify its index value to the Pandas drop function. It can be useful for selection and aggregation to have a more meaningful index. For our sample data, the “name” column would make a good index also, and make it easier to select country rows for deletion from the data.
To delete a row from a DataFrame, use the drop() method and set the index label as the parameter.
You have a Series object after the groupby
, which can be filtered based on value with a chained lambda filter:
df.groupby(['Date','Advertiser']).ID.count()[lambda x: x >= 500]
#Date Advertiser
#2016-01 A 50000
# C 4000
# D 24000
#2016-02 A 6800
# B 7800
#2016-03 B 1111
# E 8600
# F 500
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With