I have a dataframe
from a multiple choice questions and it is formatted like so:
Sex Qu1 Qu2 Qu3
Name
Bob M 1 2 1
John M 3 3 5
Alex M 4 1 2
Jen F 3 2 4
Mary F 4 3 4
The data is a rating from 1 to 5 for the 3 multiple choice questions. I want rearrange the data so that the index is range(1,6) where 1='bad', 2='poor', 3='ok', 4='good', 5='excellent', the columns are the same and the data is the count of the number occurrences of the values (excluding the Sex column). This is basically a histogram of fixed bin sizes and the x-axis labeled with strings. I like the output of df.plot()
much better than df.hist()
for this but I can't figure out how to rearrange the table to give me a histogram of data. Also, how do you change x-labels to be strings?
Series.value_counts
gives you the histogram you're looking for:
In [9]: df['Qu1'].value_counts()
Out[9]:
4 2
3 2
1 1
So, apply this function to each of those 3 columns:
In [13]: table = df[['Qu1', 'Qu2', 'Qu3']].apply(lambda x: x.value_counts())
In [14]: table
Out[14]:
Qu1 Qu2 Qu3
1 1 1 1
2 NaN 2 1
3 2 2 NaN
4 2 NaN 2
5 NaN NaN 1
In [15]: table = table.fillna(0)
In [16]: table
Out[16]:
Qu1 Qu2 Qu3
1 1 1 1
2 0 2 1
3 2 2 0
4 2 0 2
5 0 0 1
Using table.reindex
or table.ix[some_array]
you can rearrange the data.
To transform to strings, use table.rename:
In [17]: table.rename(index=str)
Out[17]:
Qu1 Qu2 Qu3
1 1 1 1
2 0 2 1
3 2 2 0
4 2 0 2
5 0 0 1
In [18]: table.rename(index=str).index[0]
Out[18]: '1'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With