Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change data type of a specific column of a pandas dataframe

Tags:

python

pandas

I want to sort a dataframe with many columns by a specific column, but first I need to change type from object to int. How to change the data type of this specific column while keeping the original column positions?

like image 931
DougKruger Avatar asked Jan 11 '17 12:01

DougKruger


People also ask

How do I change the datatype of a specific column in pandas?

You can change the column type in pandas dataframe using the df. astype() method. Once you create a dataframe, you may need to change the column type of a dataframe for reasons like converting a column to a number format which can be easily used for modeling and classification.

How do I change DataFrame data type in pandas?

In order to convert data types in pandas, there are three basic options: Use astype() to force an appropriate dtype. Create a custom function to convert the data. Use pandas functions such as to_numeric() or to_datetime()

Can pandas column have different data types?

Pandas uses other names for data types than Python, for example: object for textual data. A column in a DataFrame can only have one data type.


4 Answers

df['colname'] = df['colname'].astype(int) works when changing from float values to int atleast.

like image 116
JimmyOnThePage Avatar answered Oct 20 '22 23:10

JimmyOnThePage


You can use reindex by sorted column by sort_values, cast to int by astype:

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'colname':['7','3','9'],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]})

print (df)
   A  B  D  E  F colname
0  1  4  1  5  7       7
1  2  5  3  3  4       3
2  3  6  5  6  3       9

print (df.colname.astype(int).sort_values())
1    3
0    7
2    9
Name: colname, dtype: int32

print (df.reindex(df.colname.astype(int).sort_values().index))
   A  B  D  E  F colname
1  2  5  3  3  4       3
0  1  4  1  5  7       7
2  3  6  5  6  3       9

print (df.reindex(df.colname.astype(int).sort_values().index).reset_index(drop=True))
   A  B  D  E  F colname
0  2  5  3  3  4       3
1  1  4  1  5  7       7
2  3  6  5  6  3       9

If first solution does not works because None or bad data use to_numeric:

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'colname':['7','3','None'],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]})

print (df)
   A  B  D  E  F colname
0  1  4  1  5  7       7
1  2  5  3  3  4       3
2  3  6  5  6  3    None

print (pd.to_numeric(df.colname, errors='coerce').sort_values())
1    3.0
0    7.0
2    NaN
Name: colname, dtype: float64
like image 28
jezrael Avatar answered Oct 20 '22 23:10

jezrael


I have tried following:

df['column']=df.column.astype('int64')

and it worked for me.

like image 39
user19120 Avatar answered Oct 20 '22 23:10

user19120


To simply change one column, here is what you can do: df.column_name.apply(int)

you can replace int with the desired datatype you want e.g (np.int64), str, category.

For multiple datatype changes, I would recommend the following:

df = pd.read_csv(data, dtype={'Col_A': str,'Col_B':int64})

like image 33
sargupta Avatar answered Oct 20 '22 22:10

sargupta