I tried to convert a column from data type float64
to int64
using:
df['column name'].astype(int64)
but got an error:
NameError: name 'int64' is not defined
The column has number of people but was formatted as 7500000.0
, any idea how I can simply change this float64
into int64
?
to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
To convert a column that includes a mixture of float and NaN values to int, first replace NaN values with zero on pandas DataFrame and then use astype() to convert. Use DataFrame. fillna() to replace the NaN values with integer value zero.
Using pandas. Alternatively, you can convert all string columns to float type using pandas. to_numeric() . For example use df['Discount'] = pd. to_numeric(df['Discount']) function to convert 'Discount' column to float.
Solution for pandas 0.24+ for converting numeric with missing values:
df = pd.DataFrame({'column name':[7500000.0,7500000.0, np.nan]}) print (df['column name']) 0 7500000.0 1 7500000.0 2 NaN Name: column name, dtype: float64 df['column name'] = df['column name'].astype(np.int64)
ValueError: Cannot convert non-finite values (NA or inf) to integer
#http://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html df['column name'] = df['column name'].astype('Int64') print (df['column name']) 0 7500000 1 7500000 2 NaN Name: column name, dtype: Int64
I think you need cast to numpy.int64
:
df['column name'].astype(np.int64)
Sample:
df = pd.DataFrame({'column name':[7500000.0,7500000.0]}) print (df['column name']) 0 7500000.0 1 7500000.0 Name: column name, dtype: float64 df['column name'] = df['column name'].astype(np.int64) #same as #df['column name'] = df['column name'].astype(pd.np.int64) print (df['column name']) 0 7500000 1 7500000 Name: column name, dtype: int64
If some NaN
s in columns need replace them to some int
(e.g. 0
) by fillna
, because type
of NaN
is float
:
df = pd.DataFrame({'column name':[7500000.0,np.nan]}) df['column name'] = df['column name'].fillna(0).astype(np.int64) print (df['column name']) 0 7500000 1 0 Name: column name, dtype: int64
Also check documentation - missing data casting rules
EDIT:
Convert values with NaN
s is buggy:
df = pd.DataFrame({'column name':[7500000.0,np.nan]}) df['column name'] = df['column name'].values.astype(np.int64) print (df['column name']) 0 7500000 1 -9223372036854775808 Name: column name, dtype: int64
You can need to pass in the string 'int64'
:
>>> import pandas as pd >>> df = pd.DataFrame({'a': [1.0, 2.0]}) # some test dataframe >>> df['a'].astype('int64') 0 1 1 2 Name: a, dtype: int64
There are some alternative ways to specify 64-bit integers:
>>> df['a'].astype('i8') # integer with 8 bytes (64 bit) 0 1 1 2 Name: a, dtype: int64 >>> import numpy as np >>> df['a'].astype(np.int64) # native numpy 64 bit integer 0 1 1 2 Name: a, dtype: int64
Or use np.int64
directly on your column (but it returns a numpy.array
):
>>> np.int64(df['a']) array([1, 2], dtype=int64)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With