Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change type of pandas series/dataframe column inplace

Tags:

python

pandas

TL;DR: I'd like to change the data types of pandas dataframe columns in-place.


I have a pandas dataframe:

df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6.1]})

Which by default gets its columns assigned 'int64' and 'float64' on my system:

df.dtypes
Out[172]: 
a      int64
b    float64
dtype: object

Because my dataframe will be very large, I'd like to set the column data types, after having created the dataframe, to int32 and float32. I know how I could do this:

df['a'] = df['a'].astype(np.int32)
df['b'] = df['b'].astype(np.float32)

or, in one step:

df = df.astype({'a':np.int32, 'b':np.float32})

and the dtypes of my dataframe are indeed:

df.dtypes
Out[180]: 
a      int32
b    float32
dtype: object

However: this seems clunky, having to reassign the series, esp. since many pandas methods have an inplace kwarg. Using this, however, doesn't seem to work (starting out with the same dataframe at the top):

df['a'].astype(np.int32, inplace=True)

df.dtypes
Out[187]: 
a      int64
b    float64
dtype: object

Is there something I'm overlooking here? Is this by design? The same behaviour is shown when working with Series instead of DataFrame objects.

Many thanks,

like image 785
ElRudi Avatar asked Mar 26 '19 10:03

ElRudi


People also ask

How do I change the datatype of a panda series?

Change data type of a series in PandasUse a numpy. dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy. dtype or Python type to cast one or more of the DataFrame's columns to column-specific types.

How do I change the column type in pandas?

You can change the column type in pandas dataframe using the df. astype() method. Once you create a dataframe, you may need to change the column type of a dataframe for reasons like converting a column to a number format which can be easily used for modeling and classification.

How do I change the datatype of multiple columns in pandas?

to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.

Can pandas series have different data types?

In the same way you can't attach a specific data type to list , even if all elements are of the same type, a Pandas object series contains pointers to any number of types.

How do I change the column place in a DataFrame?

You need to create a new list of your columns in the desired order, then use df = df[cols] to rearrange the columns in this new order.


2 Answers

You can write your own (still clunky) inplace versions:

def astype_inplace(df: pd.DataFrame, dct: Dict):
    df[list(dct.keys())] = df.astype(dct)[list(dct.keys())]

def astype_per_column(df: pd.DataFrame, column: str, dtype):
    df[column] = df[column].astype(dtype)

and use it like

astype_inplace(df, {'bool_col':'boolean'})

or

astype_per_column(df, 'bool_col', 'boolean')
like image 161
Philipp Avatar answered Sep 30 '22 05:09

Philipp


And what about

>>> df.__dict__.update(df.astype({'a': np.int32, 'b': np.float32}).__dict__)
>>> df.dtypes
a      int32
b    float32
dtype: object

?

like image 44
keepAlive Avatar answered Sep 30 '22 04:09

keepAlive