I tried to convert a column from data type float64 to int64 using:
df['column name'].astype(int64)   but got an error:
NameError: name 'int64' is not defined
The column has number of people but was formatted as 7500000.0, any idea how I can simply change this float64 into int64?
to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
To convert a column that includes a mixture of float and NaN values to int, first replace NaN values with zero on pandas DataFrame and then use astype() to convert. Use DataFrame. fillna() to replace the NaN values with integer value zero.
Using pandas. Alternatively, you can convert all string columns to float type using pandas. to_numeric() . For example use df['Discount'] = pd. to_numeric(df['Discount']) function to convert 'Discount' column to float.
Solution for pandas 0.24+ for converting numeric with missing values:
df = pd.DataFrame({'column name':[7500000.0,7500000.0, np.nan]}) print (df['column name']) 0    7500000.0 1    7500000.0 2          NaN Name: column name, dtype: float64  df['column name'] = df['column name'].astype(np.int64)   ValueError: Cannot convert non-finite values (NA or inf) to integer
#http://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html df['column name'] = df['column name'].astype('Int64') print (df['column name']) 0    7500000 1    7500000 2        NaN Name: column name, dtype: Int64   I think you need cast to numpy.int64:
df['column name'].astype(np.int64)   Sample:
df = pd.DataFrame({'column name':[7500000.0,7500000.0]}) print (df['column name']) 0    7500000.0 1    7500000.0 Name: column name, dtype: float64  df['column name'] = df['column name'].astype(np.int64) #same as #df['column name'] = df['column name'].astype(pd.np.int64) print (df['column name']) 0    7500000 1    7500000 Name: column name, dtype: int64   If some NaNs in columns need replace them to some int (e.g. 0) by fillna, because type of NaN is float:
df = pd.DataFrame({'column name':[7500000.0,np.nan]})  df['column name'] = df['column name'].fillna(0).astype(np.int64) print (df['column name']) 0    7500000 1          0 Name: column name, dtype: int64   Also check documentation - missing data casting rules
EDIT:
Convert values with NaNs is buggy:
df = pd.DataFrame({'column name':[7500000.0,np.nan]})  df['column name'] = df['column name'].values.astype(np.int64) print (df['column name']) 0                7500000 1   -9223372036854775808 Name: column name, dtype: int64 
                        You can need to pass in the string 'int64':
>>> import pandas as pd >>> df = pd.DataFrame({'a': [1.0, 2.0]})  # some test dataframe  >>> df['a'].astype('int64') 0    1 1    2 Name: a, dtype: int64   There are some alternative ways to specify 64-bit integers:
>>> df['a'].astype('i8')      # integer with 8 bytes (64 bit) 0    1 1    2 Name: a, dtype: int64  >>> import numpy as np >>> df['a'].astype(np.int64)  # native numpy 64 bit integer 0    1 1    2 Name: a, dtype: int64   Or use np.int64 directly on your column (but it returns a numpy.array):
>>> np.int64(df['a']) array([1, 2], dtype=int64) 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With