TL;DR: I'd like to change the data types of pandas dataframe columns in-place.
I have a pandas dataframe:
df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6.1]})
Which by default gets its columns assigned 'int64' and 'float64' on my system:
df.dtypes
Out[172]:
a int64
b float64
dtype: object
Because my dataframe will be very large, I'd like to set the column data types, after having created the dataframe, to int32 and float32. I know how I could do this:
df['a'] = df['a'].astype(np.int32)
df['b'] = df['b'].astype(np.float32)
or, in one step:
df = df.astype({'a':np.int32, 'b':np.float32})
and the dtypes of my dataframe are indeed:
df.dtypes
Out[180]:
a int32
b float32
dtype: object
However: this seems clunky, having to reassign the series, esp. since many pandas methods have an inplace
kwarg. Using this, however, doesn't seem to work (starting out with the same dataframe at the top):
df['a'].astype(np.int32, inplace=True)
df.dtypes
Out[187]:
a int64
b float64
dtype: object
Is there something I'm overlooking here? Is this by design? The same behaviour is shown when working with Series
instead of DataFrame
objects.
Many thanks,
Change data type of a series in PandasUse a numpy. dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy. dtype or Python type to cast one or more of the DataFrame's columns to column-specific types.
You can change the column type in pandas dataframe using the df. astype() method. Once you create a dataframe, you may need to change the column type of a dataframe for reasons like converting a column to a number format which can be easily used for modeling and classification.
to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
In the same way you can't attach a specific data type to list , even if all elements are of the same type, a Pandas object series contains pointers to any number of types.
You need to create a new list of your columns in the desired order, then use df = df[cols] to rearrange the columns in this new order.
You can write your own (still clunky) inplace versions:
def astype_inplace(df: pd.DataFrame, dct: Dict):
df[list(dct.keys())] = df.astype(dct)[list(dct.keys())]
def astype_per_column(df: pd.DataFrame, column: str, dtype):
df[column] = df[column].astype(dtype)
and use it like
astype_inplace(df, {'bool_col':'boolean'})
or
astype_per_column(df, 'bool_col', 'boolean')
And what about
>>> df.__dict__.update(df.astype({'a': np.int32, 'b': np.float32}).__dict__)
>>> df.dtypes
a int32
b float32
dtype: object
?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With