I have a data set that looks like the following:
student question answer number
Bob How many donuts in a dozen? A 1
Sally How many donuts in a dozen? C 1
Edward How many donuts in a dozen? A 1
....
Edward What colour is the sky? C 1
Marvin What colour is the sky? D 1
From which I wrote some code that generates a pivot table to total up the results of a test, like so:
data = pd.pivot_table(df,index=['question'],columns = ['answer'],aggfunc='count',fill_value = 0)
number
answer A B C D
question
How many donuts in a dozen? 1 4 3 2
What colour is the sky? 1 9 0 0
From there I'm creating a heatmap from the pivot table for visualization purposes. Generally this works. However, if for some reason there are no students in the selected set who have chosen one of the answers (say, no one selected "D" for any questions) then that column doesn't show up in the heatmap; the column is left off.
How can I ensure that all the required columns display in the heatmap, even if no one selected that answer?
In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.
Pandas DataFrame dropna() Function how: possible values are {'any', 'all'}, default 'any'. If 'any', drop the row/column if any of the values is null. If 'all', drop the row/column if all the values are missing. thresh: an int value to specify the threshold for the drop operation.
Pandas pivot tables work in a very similar way to those found in spreadsheet tools such as Excel. The pivot table function takes in a data frame, some parameters detailing the shape you want the data to take and the outputs is summarised data in the form of a pivot table.
I think an even simpler approach would be to add 'dropna = False' to the pivot table parameters, default behavior is set to 'True'. This worked for me in a similar situation with time series data that contained large swaths of days with NaNs.
pd.pivot_table(dropna = False)
You can take all possible answers and reindex
your result. For example, in the small sample you have provided, no student selected B
. Let's say your options are A, B, C, D:
answers = [*'ABCD']
res = df.pivot_table(
index='question',
columns='answer',
values='number',
aggfunc='sum',
fill_value=0
).reindex(answers, axis=1, fill_value=0)
answer A B C D
question
How many donuts in a dozen? 2 0 1 0
What colour is the sky? 0 0 1 1
The corresponding heatmap:
import matplotlib.pyplot as plt
import seaborn as sns
sns.heatmap(res, annot=True)
plt.tight_layout()
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With