Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Explanation about pandas value_counts function

Tags:

python

pandas

Can someone please explain what does the line

result = data.apply(pd.value_counts).fillna(0)  

does in here?

import pandas as pd 
from pandas import Series, DataFrame

data = DataFrame({'Qu1': [1, 3, 4, 3, 4],
                  'Qu2': [2, 3, 1, 2, 3],
                  'Qu3': [1, 5, 2, 4, 4]})

result = data.apply(pd.value_counts).fillna(0)  

In [26]:data
Out[26]:
Qu1 Qu2 Qu3
0 1 2 1
1 3 3 5
2 4 1 2
3 3 2 4
4 4 3 4

In [27]:result
Out[28]:
Qu1 Qu2 Qu3
1 1 1 1
2 0 2 1
3 2 2 0
4 2 0 2
5 0 0 1
like image 769
Quazi Farhan Avatar asked Dec 25 '22 13:12

Quazi Farhan


2 Answers

I think the easiest way to understand what's going on is to break it down.

One each column, value_counts simply counts the number of occurrences of each value in the Series (i.e. in 4 appears twice in the Qu1 column):

In [11]: pd.value_counts(data.Qu1)
Out[11]:
4    2
3    2
1    1
dtype: int64

When you do an apply each column is realigned with the other results, since every value between 1 and 5 is seen it's aligned with range(1, 6):

In [12]: pd.value_counts(data.Qu1).reindex(range(1, 6))
Out[12]:
1     1
2   NaN
3     2
4     2
5   NaN
dtype: float64

You want to count the values you didn't see as 0 rather than NaN, hence the fillna:

In [13]: pd.value_counts(data.Qu1).reindex(range(1, 6)).fillna(0)
Out[13]:
1    1
2    0
3    2
4    2
5    0
dtype: float64

When you do the apply, it concats the result of doing this for each column:

In [14]: pd.concat((pd.value_counts(data[col]).reindex(range(1, 6)).fillna(0)
                       for col in data.columns),
                   axis=1, keys=data.columns)
Out[14]:
   Qu1  Qu2  Qu3
1    1    1    1
2    0    2    1
3    2    2    0
4    2    0    2
5    0    0    1
like image 135
Andy Hayden Avatar answered Jan 08 '23 14:01

Andy Hayden


From the docs, it produces a histogram of non-null values. Looking just at column Qu1 of result, we can tell that there is one 1, zero 2's, two 3's, two 4's, and zero 5's in the original column data.Qu1.

like image 31
U2EF1 Avatar answered Jan 08 '23 14:01

U2EF1