Python: How to generate frequency count for all variables

Question

I have a dataset of all categorical variables, and I would like to produce frequency counts for all variables at once.

Supposedly, I'm using the Iris dataset function df['class'].value_counts() will only allow me to count for one variable.

To analyze all variables for a dataset consists only categorical variables extracted as a csv through Pandas. I'm thinking extracting only the first row and put in a for loop. To extract the first row from a csv file we convert csv to dataframe by using data = pd.DataFrame(data). However, data[0] will generate an error.

What is the most efficient way of producing frequency analysis or bar graphs for all variables?

Sample dataset with categorical variables:

   Mary  John   David    Jenny
    a     t       y        n
    a     t       n        y
    a     u       y        y
    a     u       n        y
    a     u       n        n
    b     t       y        n

harvpan · Accepted Answer

Method 1

df.apply(lambda x: x.value_counts()).T.stack()

Output:

Mary   a    5.0
       b    1.0
John   t    3.0
       u    3.0
David  n    3.0
       y    3.0
Jenny  n    3.0
       y    3.0
dtype: float64

Method 2

df.apply(pd.value_counts).T.fillna(0)

Output

          a   b   n   t   u   y
Mary    5.0 1.0 0.0 0.0 0.0 0.0
John    0.0 0.0 0.0 3.0 3.0 0.0
David   0.0 0.0 3.0 0.0 0.0 3.0
Jenny   0.0 0.0 3.0 0.0 0.0 3.0

Then, you can simply use below o create a bar chart.

df.apply(pd.value_counts).T.stack().plot(kind='bar')

Output:

enter image description here

Alternatively, you can use:

df.apply(pd.value_counts).fillna(0).T.plot(kind='bar')

Output:

enter image description here

Python: How to generate frequency count for all variables

Tags:

python

pandas

count

numpy

lydias

1 Answers

harvpan

Recent Activity

Donate For Us

Python: How to generate frequency count for all variables

Tags:

python

pandas

count

numpy

lydias

1 Answers

harvpan

Related questions

Recent Activity

Donate For Us