Say I got this multiindex DataFrame:
>>> df = pandas.DataFrame(index=range(3), columns=pandas.MultiIndex.from_product(
(('A', 'B'), ('C', 'D'), ('E', 'F'))))
>>> df
A B
C D C D
E F E F E F E F
0 NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN
>>> df.dtypes
A C E object
F object
D E object
F object
B C E object
F object
D E object
F object
How would I set the type of all columns E to float64 and all columns F to int64? I.e., so that df.dtypes returns the following:
A C E float64
F int64
D E float64
F int64
B C E float64
F int64
D E float64
F int64
I know about DataFrame.astype and it works fine for singly indexed DataFrame's but how would I use it with multiindexing? In the real code the number of columns are a lot higher: still three levels, but columns reaching couple of millions.
I've been searching the web and the documentation though I can't find the answer. It feels like I've misunderstood something about the DataFrame concept and that I'm wrong in wanting what I want.
Thank you in advance!
Integer columns of NaNs aren't supported on older versions, but starting from v0.24, you can use the nullable dtype. Select column slices using pd.IndexSlice, then set the type like this:
pd.__version__
# '0.24.2'
for cval, dtype in [('E', 'float64'), ('F', 'Int64')]:
df.loc[:, pd.IndexSlice[:, :,cval]] = (
df.loc[:, pd.IndexSlice[:, :,cval]].astype(dtype))
df.dtypes
A C E float64
F Int64
D E float64
F Int64
B C E float64
F Int64
D E float64
F Int64
dtype: object
Note that the I in Int64 is capitalized to represent a Nullable Integer Type.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With