Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas: Count number of times each unique value appears for multiple columns

Tags:

pandas

Suppose I have a DataFrame such as,

In [7]: source = pd.DataFrame([['amazon.com', 'correct', 'correct'], ['amazon.com', 'incorrect', 'correct'], ['walmart.com', 'incorrect', 'correct'], ['walmart.com', 'incorrect', 'incorrect']], columns=['domain', 'price', 'product'])

In [8]: source
Out[8]:
        domain      price    product
0   amazon.com    correct    correct
1   amazon.com  incorrect    correct
2  walmart.com  incorrect    correct
3  walmart.com  incorrect  incorrect

I would like to count, for each domain, the number of times price == 'correct' and price == 'incorrect', and the same for product. In other words, I'd like to see output like so,

        domain      key      value  count
0   amazon.com    price    correct      1
1   amazon.com    price  incorrect      1
2   amazon.com  product    correct      2
3  walmart.com    price  incorrect      2
4  walmart.com  product    correct      1
5  walmart.com  product  incorrect      1

How do I do this?

like image 231
duckworthd Avatar asked Mar 23 '23 02:03

duckworthd


1 Answers

A nested apply will do it

In [24]: source.groupby('domain').apply(lambda x: 
                          x[['price','product']].apply(lambda y: y.value_counts())).fillna(0)

Out[24]: 
                       price  product
domain                               
amazon.com  correct        1        2
            incorrect      1        0
walmart.com correct        0        1
            incorrect      2        1
like image 163
Jeff Avatar answered Apr 06 '23 17:04

Jeff