I have a dataset of all categorical variables, and I would like to produce frequency counts for all variables at once.
Supposedly, I'm using the Iris dataset function df['class'].value_counts()
will only allow me to count for one variable.
To analyze all variables for a dataset consists only categorical variables extracted as a csv through Pandas. I'm thinking extracting only the first row and put in a for loop. To extract the first row from a csv file we convert csv to dataframe by using data = pd.DataFrame(data)
. However, data[0]
will generate an error.
What is the most efficient way of producing frequency analysis or bar graphs for all variables?
Sample dataset with categorical variables:
Mary John David Jenny
a t y n
a t n y
a u y y
a u n y
a u n n
b t y n
Method 1
df.apply(lambda x: x.value_counts()).T.stack()
Output:
Mary a 5.0
b 1.0
John t 3.0
u 3.0
David n 3.0
y 3.0
Jenny n 3.0
y 3.0
dtype: float64
Method 2
df.apply(pd.value_counts).T.fillna(0)
Output
a b n t u y
Mary 5.0 1.0 0.0 0.0 0.0 0.0
John 0.0 0.0 0.0 3.0 3.0 0.0
David 0.0 0.0 3.0 0.0 0.0 3.0
Jenny 0.0 0.0 3.0 0.0 0.0 3.0
Then, you can simply use below o create a bar chart.
df.apply(pd.value_counts).T.stack().plot(kind='bar')
Output:
Alternatively, you can use:
df.apply(pd.value_counts).fillna(0).T.plot(kind='bar')
Output:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With