Suppose I have a DataFrame such as,
In [7]: source = pd.DataFrame([['amazon.com', 'correct', 'correct'], ['amazon.com', 'incorrect', 'correct'], ['walmart.com', 'incorrect', 'correct'], ['walmart.com', 'incorrect', 'incorrect']], columns=['domain', 'price', 'product'])
In [8]: source
Out[8]:
domain price product
0 amazon.com correct correct
1 amazon.com incorrect correct
2 walmart.com incorrect correct
3 walmart.com incorrect incorrect
I would like to count, for each domain
, the number of times price == 'correct'
and price == 'incorrect'
, and the same for product
. In other words, I'd like to see output like so,
domain key value count
0 amazon.com price correct 1
1 amazon.com price incorrect 1
2 amazon.com product correct 2
3 walmart.com price incorrect 2
4 walmart.com product correct 1
5 walmart.com product incorrect 1
How do I do this?
A nested apply will do it
In [24]: source.groupby('domain').apply(lambda x:
x[['price','product']].apply(lambda y: y.value_counts())).fillna(0)
Out[24]:
price product
domain
amazon.com correct 1 2
incorrect 1 0
walmart.com correct 0 1
incorrect 2 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With