Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace NaN values in all levels of a Pandas MultiIndex

After reading in an excel sheet with a MultiIndex, I am getting np.nan appearing in the index because some of the values are 'N/A' and pd.read_excel thinks it's a good idea to convert them. However I want to keep them as 'N/A' to preserve the multi-index. I thought it would be easy to change them back using MultiIndex.fillna but I get this error:

index = pd.MultiIndex(levels=[[u'foo', u'bar'], [u'one', np.nan]],
           codes=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=[u'first', u'second'])
df = pd.DataFrame(index=index, columns=['A', 'B'])
df

enter image description here

df.index.fillna("N/A")

Output:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-17-09e14dcdc74f> in <module>
----> 1 df.index.fillna("N/A")

/anaconda3/envs/torch/lib/python3.7/site-packages/pandas/core/indexes/multi.py in fillna(self, value, downcast)
   1456         fillna is not implemented for MultiIndex
   1457         """
-> 1458         raise NotImplementedError("isna is not defined for MultiIndex")
   1459 
   1460     @Appender(_index_shared_docs["dropna"])

NotImplementedError: isna is not defined for MultiIndex

Update:

Code updated to reflect Pandas 1.0.2. Prior to version 0.24.0 the codes attribute of pd.MultiIndex was called labels. Also, the traceback details changed from isnull is not defined to isna is not defined as above.

like image 916
Bill Avatar asked Jan 06 '23 09:01

Bill


2 Answers

The accepted solution did not work for me either. It still left NA values in the index even though inspecting the df.index.levels individually did not show NA values.

Jorge's solution pointed me in the right direction but also wasn't quite right for my case. Here is my approach, including handling of the single Index case as discussed in the comments of the accepted answer.

if isinstance(df.index, pd.MultiIndex):
    df.index = pd.MultiIndex.from_frame(
        df.index.to_frame().fillna(my_fillna_value)
    )
else:
    df.index = df.index.fillna(my_fillna_value)
like image 148
totalhack Avatar answered Jan 08 '23 11:01

totalhack


Use set_levels

df.index.set_levels([l.fillna('N/A') for l in df.index.levels], inplace=True)
df

enter image description here

like image 45
piRSquared Avatar answered Jan 08 '23 09:01

piRSquared