There is a pandas dataframe on input:
store_id item_id items_sold date
1 1 0 2015-12-28
1 2 1 2015-12-28
1 1 0 2015-12-28
2 2 0 2015-12-28
2 1 1 2015-12-29
2 2 1 2015-12-29
2 1 0 2015-12-29
3 1 0 2015-12-30
3 1 0 2015-12-30
I need to drop all rows with items that have never been sold in particular store: pairs (1,1), (3,1) of (store_id, item_id) in the dataframe
The output i expect is following:
store_id item_id items_sold date
1 2 1 2015-12-28
2 2 0 2015-12-28
2 1 1 2015-12-29
2 2 1 2015-12-29
2 1 0 2015-12-29
I've figured out how to find required pairs of (store_id, item_id)
using pd.groupby()[].sum()
, but stuck with dropping them from initial dataframe
is that what you want?
In [30]: df[df.groupby(['store_id', 'item_id'])['items_sold'].transform('sum') > 0]
Out[30]:
store_id item_id items_sold date
1 1 2 1 2015-12-28
3 2 2 0 2015-12-28
4 2 1 1 2015-12-29
5 2 2 1 2015-12-29
6 2 1 0 2015-12-29
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With