Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert float64 column to int64 in Pandas

I tried to convert a column from data type float64 to int64 using:

df['column name'].astype(int64) 

but got an error:

NameError: name 'int64' is not defined

The column has number of people but was formatted as 7500000.0, any idea how I can simply change this float64 into int64?

like image 940
MCG Code Avatar asked May 13 '17 18:05

MCG Code


People also ask

How do I change the datatype of a column in pandas?

to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.

How do I change Dtype from float to int?

To convert a column that includes a mixture of float and NaN values to int, first replace NaN values with zero on pandas DataFrame and then use astype() to convert. Use DataFrame. fillna() to replace the NaN values with integer value zero.

How do I convert all columns to float in pandas?

Using pandas. Alternatively, you can convert all string columns to float type using pandas. to_numeric() . For example use df['Discount'] = pd. to_numeric(df['Discount']) function to convert 'Discount' column to float.


2 Answers

Solution for pandas 0.24+ for converting numeric with missing values:

df = pd.DataFrame({'column name':[7500000.0,7500000.0, np.nan]}) print (df['column name']) 0    7500000.0 1    7500000.0 2          NaN Name: column name, dtype: float64  df['column name'] = df['column name'].astype(np.int64) 

ValueError: Cannot convert non-finite values (NA or inf) to integer

#http://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html df['column name'] = df['column name'].astype('Int64') print (df['column name']) 0    7500000 1    7500000 2        NaN Name: column name, dtype: Int64 

I think you need cast to numpy.int64:

df['column name'].astype(np.int64) 

Sample:

df = pd.DataFrame({'column name':[7500000.0,7500000.0]}) print (df['column name']) 0    7500000.0 1    7500000.0 Name: column name, dtype: float64  df['column name'] = df['column name'].astype(np.int64) #same as #df['column name'] = df['column name'].astype(pd.np.int64) print (df['column name']) 0    7500000 1    7500000 Name: column name, dtype: int64 

If some NaNs in columns need replace them to some int (e.g. 0) by fillna, because type of NaN is float:

df = pd.DataFrame({'column name':[7500000.0,np.nan]})  df['column name'] = df['column name'].fillna(0).astype(np.int64) print (df['column name']) 0    7500000 1          0 Name: column name, dtype: int64 

Also check documentation - missing data casting rules

EDIT:

Convert values with NaNs is buggy:

df = pd.DataFrame({'column name':[7500000.0,np.nan]})  df['column name'] = df['column name'].values.astype(np.int64) print (df['column name']) 0                7500000 1   -9223372036854775808 Name: column name, dtype: int64 
like image 188
jezrael Avatar answered Sep 23 '22 20:09

jezrael


You can need to pass in the string 'int64':

>>> import pandas as pd >>> df = pd.DataFrame({'a': [1.0, 2.0]})  # some test dataframe  >>> df['a'].astype('int64') 0    1 1    2 Name: a, dtype: int64 

There are some alternative ways to specify 64-bit integers:

>>> df['a'].astype('i8')      # integer with 8 bytes (64 bit) 0    1 1    2 Name: a, dtype: int64  >>> import numpy as np >>> df['a'].astype(np.int64)  # native numpy 64 bit integer 0    1 1    2 Name: a, dtype: int64 

Or use np.int64 directly on your column (but it returns a numpy.array):

>>> np.int64(df['a']) array([1, 2], dtype=int64) 
like image 21
MSeifert Avatar answered Sep 22 '22 20:09

MSeifert