In pandas, how to pivot a dataframe on a categorical series with missing categories?

Tags:

I have a pandas dataframe with a categorical series that has missing categories.

In the example shown below, group has the categories "a", "b", and "c", but there are no cases of "c" in the dataframe.

import pandas as pd
dfr = pd.DataFrame({
    "id": ["111", "222", "111", "333"], 
    "group": ["a", "a", "b", "b"], 
    "value": [1, 4, 9, 16]})
dfr["group"] = pd.Categorical(dfr["group"], categories=["a", "b", "c"])
dfr.pivot(index="id", columns="group")

The resulting pivoted dataframe has columns a and b. I expected a c column containing all missing value as well.

      value      
group     a     b
id               
111     1.0   9.0
222     4.0   NaN
333     NaN  16.0

How can I pivot a dataframe on a categorical series to include columns with all categories, regardless of whether they were present in the original dataframe?

739

asked Dec 01 '21 15:12

Richie Cotton

Video Answer

2 Answers

pd.pivot_table has a dropna argument which dictates dropping or not value columns full of NaNs.

Try setting it to False:

import pandas as pd
dfr = pd.DataFrame({
    "id": ["111", "222", "111", "333"], 
    "group": ["a", "a", "b", "b"], 
    "value": [1, 4, 9, 16]})
dfr["group"] = pd.Categorical(dfr["group"], categories=["a", "b", "c"])
pd.pivot_table(dfr, index="id", columns="group", dropna=False)

109

answered Nov 01 '22 18:11

Learning is a mess

You can reindex. This will work even if your value column is not numerical (unlike pivot_table):

output = (dfr.pivot(index="id", columns="group")
             .reindex(columns=pd.MultiIndex.from_product([["value"],
                                                          dfr["group"].cat.categories]
                                                         )
                      )
             )

>>> output
    value          
        a     b   c
id                 
111   1.0   9.0 NaN
222   4.0   NaN NaN
333   NaN  16.0 NaN

answered Nov 01 '22 17:11

not_speshal

Related questions
                            
                                Drop all rows that have all NA values after last row that is not NA
                            
                                Building ML classifier with imbalanced data
                            
                                yfinance not working - receiving json.decoder.JSONDecodeError
                            
                                Django admin, page not found in custom view
                            
                                AttributeError: dlsym(RTLD_DEFAULT, AttachDebuggerTracing): symbol not found
                            
                                Using decorators of optional dependency
                            
                                Can anyone please explain why set is behaving like this with boolean in it? [duplicate]
                            
                                How to parse datetime that is coming in Arabic text (٠٤-٢٥-٢٠٢١) to English dates in Pyspark
                            
                                Split a string in pandas row and insert new rows by enlarging the dataframe
                            
                                Pandas counting the number of group elements excluding the focal element
                            
                                divide group data base on select columns values?
                            
                                Pandas DataFrame to Excel cell alignment
                            
                                Efficient way to extract data from NETCDF files
                            
                                Prompting "ImportError: No module named py27_urlquote" when running dev_appserver.py on Google Cloud SDK
                            
                                How to type-hint / type-check a dictionary (at runtime) for an arbitrary number of arbitrary key/value pairs?
                            
                                Django REST API accept list instead of dictionary in post request
                            
                                How to find the number of neighbours pixels in binary array
                            
                                Efficient way to map 3D function to a meshgrid with NumPy
                            
                                How can I use value_counts() only for certain values?
                            
                                Automatically Update Python source code (imports)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

In pandas, how to pivot a dataframe on a categorical series with missing categories?

Tags:

python

pandas

pivot

categorical-data