Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to generate frequency count for all variables

I have a dataset of all categorical variables, and I would like to produce frequency counts for all variables at once.

Supposedly, I'm using the Iris dataset function df['class'].value_counts() will only allow me to count for one variable.

To analyze all variables for a dataset consists only categorical variables extracted as a csv through Pandas. I'm thinking extracting only the first row and put in a for loop. To extract the first row from a csv file we convert csv to dataframe by using data = pd.DataFrame(data). However, data[0] will generate an error.

What is the most efficient way of producing frequency analysis or bar graphs for all variables?

Sample dataset with categorical variables:

   Mary  John   David    Jenny
    a     t       y        n
    a     t       n        y
    a     u       y        y
    a     u       n        y
    a     u       n        n
    b     t       y        n
like image 848
lydias Avatar asked Mar 05 '23 17:03

lydias


1 Answers

Method 1

df.apply(lambda x: x.value_counts()).T.stack()

Output:

Mary   a    5.0
       b    1.0
John   t    3.0
       u    3.0
David  n    3.0
       y    3.0
Jenny  n    3.0
       y    3.0
dtype: float64

Method 2

df.apply(pd.value_counts).T.fillna(0)

Output

          a   b   n   t   u   y
Mary    5.0 1.0 0.0 0.0 0.0 0.0
John    0.0 0.0 0.0 3.0 3.0 0.0
David   0.0 0.0 3.0 0.0 0.0 3.0
Jenny   0.0 0.0 3.0 0.0 0.0 3.0

Then, you can simply use below o create a bar chart.

df.apply(pd.value_counts).T.stack().plot(kind='bar')

Output:

enter image description here

Alternatively, you can use:

df.apply(pd.value_counts).fillna(0).T.plot(kind='bar')

Output:

enter image description here

like image 153
harvpan Avatar answered Mar 22 '23 22:03

harvpan