Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Missing values in Pandas Pivot table?

Tags:

python

pandas

I have a data set that looks like the following:

student     question                        answer   number
Bob         How many donuts in a dozen?       A        1
Sally       How many donuts in a dozen?       C        1
Edward      How many donuts in a dozen?       A        1
....
Edward      What colour is the sky?           C        1
Marvin      What colour is the sky?           D        1

From which I wrote some code that generates a pivot table to total up the results of a test, like so:

data = pd.pivot_table(df,index=['question'],columns = ['answer'],aggfunc='count',fill_value = 0)

                                   number
                     answer     A    B   C   D
       question
How many donuts in a dozen?     1    4   3   2
What colour is the sky?         1    9   0   0

From there I'm creating a heatmap from the pivot table for visualization purposes. Generally this works. However, if for some reason there are no students in the selected set who have chosen one of the answers (say, no one selected "D" for any questions) then that column doesn't show up in the heatmap; the column is left off.

How can I ensure that all the required columns display in the heatmap, even if no one selected that answer?

like image 239
NoobsterNoob Avatar asked Jun 19 '19 16:06

NoobsterNoob


People also ask

How do I show missing values in pandas?

In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.

How do you drop all columns with missing values in pandas?

Pandas DataFrame dropna() Function how: possible values are {'any', 'all'}, default 'any'. If 'any', drop the row/column if any of the values is null. If 'all', drop the row/column if all the values are missing. thresh: an int value to specify the threshold for the drop operation.

How do pivot tables work in pandas?

Pandas pivot tables work in a very similar way to those found in spreadsheet tools such as Excel. The pivot table function takes in a data frame, some parameters detailing the shape you want the data to take and the outputs is summarised data in the form of a pivot table.


2 Answers

I think an even simpler approach would be to add 'dropna = False' to the pivot table parameters, default behavior is set to 'True'. This worked for me in a similar situation with time series data that contained large swaths of days with NaNs.

pd.pivot_table(dropna = False)
like image 192
Andrew Avatar answered Sep 27 '22 18:09

Andrew


You can take all possible answers and reindex your result. For example, in the small sample you have provided, no student selected B. Let's say your options are A, B, C, D:


answers = [*'ABCD']

res = df.pivot_table(
  index='question',
  columns='answer',
  values='number',
  aggfunc='sum',
  fill_value=0
).reindex(answers, axis=1, fill_value=0)

answer                       A  B  C  D
question
How many donuts in a dozen?  2  0  1  0
What colour is the sky?      0  0  1  1

The corresponding heatmap:

import matplotlib.pyplot as plt
import seaborn as sns
sns.heatmap(res, annot=True)
plt.tight_layout()
plt.show()

enter image description here

like image 25
user3483203 Avatar answered Sep 27 '22 19:09

user3483203