Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting the types (dtypes) of a multiindex DataFrame

Tags:

python

pandas

Say I got this multiindex DataFrame:

>>> df = pandas.DataFrame(index=range(3), columns=pandas.MultiIndex.from_product(
        (('A', 'B'), ('C', 'D'), ('E', 'F'))))
>>> df
     A                   B                                                                             
     C         D         C         D                                                                   
     E    F    E    F    E    F    E    F                                                              
0  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN                                                              
1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN                                                              
2  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
>>> df.dtypes                                                                                          
A  C  E    object                                                                                      
      F    object                                                                                      
   D  E    object                                                                                      
      F    object                                                                                      
B  C  E    object                                                                                      
      F    object                                                                                      
   D  E    object                                                                                      
      F    object 

How would I set the type of all columns E to float64 and all columns F to int64? I.e., so that df.dtypes returns the following:

A  C  E    float64                                                                                      
      F    int64                                                                                      
   D  E    float64                                                                                      
      F    int64                                                                                      
B  C  E    float64                                                                                      
      F    int64                                                                                      
   D  E    float64                                                                                      
      F    int64

I know about DataFrame.astype and it works fine for singly indexed DataFrame's but how would I use it with multiindexing? In the real code the number of columns are a lot higher: still three levels, but columns reaching couple of millions.

I've been searching the web and the documentation though I can't find the answer. It feels like I've misunderstood something about the DataFrame concept and that I'm wrong in wanting what I want.

Thank you in advance!


1 Answers

Integer columns of NaNs aren't supported on older versions, but starting from v0.24, you can use the nullable dtype. Select column slices using pd.IndexSlice, then set the type like this:

pd.__version__
# '0.24.2'

for cval, dtype in [('E', 'float64'), ('F', 'Int64')]:
    df.loc[:, pd.IndexSlice[:, :,cval]] = (
        df.loc[:, pd.IndexSlice[:, :,cval]].astype(dtype))

df.dtypes
A  C  E    float64
      F      Int64
   D  E    float64
      F      Int64
B  C  E    float64
      F      Int64
   D  E    float64
      F      Int64
dtype: object

Note that the I in Int64 is capitalized to represent a Nullable Integer Type.

like image 102
cs95 Avatar answered Sep 04 '25 19:09

cs95



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!