Missing values in Pandas Pivot table?

Tags:

pandas

I have a data set that looks like the following:

student     question                        answer   number
Bob         How many donuts in a dozen?       A        1
Sally       How many donuts in a dozen?       C        1
Edward      How many donuts in a dozen?       A        1
....
Edward      What colour is the sky?           C        1
Marvin      What colour is the sky?           D        1

From which I wrote some code that generates a pivot table to total up the results of a test, like so:

data = pd.pivot_table(df,index=['question'],columns = ['answer'],aggfunc='count',fill_value = 0)

                                   number
                     answer     A    B   C   D
       question
How many donuts in a dozen?     1    4   3   2
What colour is the sky?         1    9   0   0

From there I'm creating a heatmap from the pivot table for visualization purposes. Generally this works. However, if for some reason there are no students in the selected set who have chosen one of the answers (say, no one selected "D" for any questions) then that column doesn't show up in the heatmap; the column is left off.

How can I ensure that all the required columns display in the heatmap, even if no one selected that answer?

239

asked Jun 19 '19 16:06

NoobsterNoob

2 Answers

I think an even simpler approach would be to add 'dropna = False' to the pivot table parameters, default behavior is set to 'True'. This worked for me in a similar situation with time series data that contained large swaths of days with NaNs.

pd.pivot_table(dropna = False)

192

answered Sep 27 '22 18:09

Andrew

You can take all possible answers and reindex your result. For example, in the small sample you have provided, no student selected B. Let's say your options are A, B, C, D:

answers = [*'ABCD']

res = df.pivot_table(
  index='question',
  columns='answer',
  values='number',
  aggfunc='sum',
  fill_value=0
).reindex(answers, axis=1, fill_value=0)

answer                       A  B  C  D
question
How many donuts in a dozen?  2  0  1  0
What colour is the sky?      0  0  1  1

The corresponding heatmap:

import matplotlib.pyplot as plt
import seaborn as sns
sns.heatmap(res, annot=True)
plt.tight_layout()
plt.show()

enter image description here

answered Sep 27 '22 19:09

user3483203

Related questions
                            
                                How can I simplifiy this python iteration?
                            
                                How exactly does the behavior of Python bool and numpy bool_ differ?
                            
                                No legends Seaborn lineplot
                            
                                How to change result of type(object)?
                            
                                How to integrate Wikidata query in python
                            
                                Pandas rolling apply function to entire window dataframe
                            
                                Splitting on / inside a list in Python
                            
                                Add path to sys.path vs. PEP E402
                            
                                Pandas Merge and filter
                            
                                Question related to super() with __init__()
                            
                                Why do I not have to define the variable in a for loop using range(), but I do have to in a while loop in Python?
                            
                                How to crop multiple rectangles or squares from JPEG?
                            
                                How do I solve the leap year function in Python for Hackerrank?
                            
                                Read and dump [bracket, list] from and to yaml with python
                            
                                Is there a more pythonic way to write multiple comparisons
                            
                                PySpark explode stringified array of dictionaries into rows
                            
                                ModuleNotFoundError when using importlib.import_module
                            
                                Pandas Timestamp rounds 30 seconds inconsistently
                            
                                How to create a Pandas DataFrame from dictionary of dataframes?
                            
                                Perform operations after styling in a dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With