I want to sort a dataframe with many columns by a specific column, but first I need to change type from object
to int
. How to change the data type of this specific column while keeping the original column positions?
You can change the column type in pandas dataframe using the df. astype() method. Once you create a dataframe, you may need to change the column type of a dataframe for reasons like converting a column to a number format which can be easily used for modeling and classification.
In order to convert data types in pandas, there are three basic options: Use astype() to force an appropriate dtype. Create a custom function to convert the data. Use pandas functions such as to_numeric() or to_datetime()
Pandas uses other names for data types than Python, for example: object for textual data. A column in a DataFrame can only have one data type.
df['colname'] = df['colname'].astype(int)
works when changing from float
values to int
atleast.
You can use reindex
by sorted column by sort_values
, cast to int
by astype
:
df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'colname':['7','3','9'],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (df)
A B D E F colname
0 1 4 1 5 7 7
1 2 5 3 3 4 3
2 3 6 5 6 3 9
print (df.colname.astype(int).sort_values())
1 3
0 7
2 9
Name: colname, dtype: int32
print (df.reindex(df.colname.astype(int).sort_values().index))
A B D E F colname
1 2 5 3 3 4 3
0 1 4 1 5 7 7
2 3 6 5 6 3 9
print (df.reindex(df.colname.astype(int).sort_values().index).reset_index(drop=True))
A B D E F colname
0 2 5 3 3 4 3
1 1 4 1 5 7 7
2 3 6 5 6 3 9
If first solution does not works because None
or bad data use to_numeric
:
df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'colname':['7','3','None'],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (df)
A B D E F colname
0 1 4 1 5 7 7
1 2 5 3 3 4 3
2 3 6 5 6 3 None
print (pd.to_numeric(df.colname, errors='coerce').sort_values())
1 3.0
0 7.0
2 NaN
Name: colname, dtype: float64
I have tried following:
df['column']=df.column.astype('int64')
and it worked for me.
To simply change one column, here is what you can do:
df.column_name.apply(int)
you can replace int
with the desired datatype you want e.g (np.int64)
, str
, category
.
For multiple datatype changes, I would recommend the following:
df = pd.read_csv(data, dtype={'Col_A': str,'Col_B':int64})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With