I've read an SQL query into Pandas and the values are coming in as dtype 'object', although they are strings, dates and integers. I am able to convert the date 'object' to a Pandas datetime dtype, but I'm getting an error when trying to convert the string and integers.
Here is an example:
>>> import pandas as pd >>> df = pd.read_sql_query('select * from my_table', conn) >>> df id date purchase 1 abc1 2016-05-22 1 2 abc2 2016-05-29 0 3 abc3 2016-05-22 2 4 abc4 2016-05-22 0 >>> df.dtypes id object date object purchase object dtype: object
Converting the df['date']
to a datetime works:
>>> pd.to_datetime(df['date']) 1 2016-05-22 2 2016-05-29 3 2016-05-22 4 2016-05-22 Name: date, dtype: datetime64[ns]
But I get an error when trying to convert the df['purchase']
to an integer:
>>> df['purchase'].astype(int) .... pandas/lib.pyx in pandas.lib.astype_intsafe (pandas/lib.c:16667)() pandas/src/util.pxd in util.set_value_at (pandas/lib.c:67540)() TypeError: long() argument must be a string or a number, not 'java.lang.Long'
NOTE: I get a similar error when I tried .astype('float')
And when trying to convert to a string, nothing seems to happen.
>>> df['id'].apply(str) 1 abc1 2 abc2 3 abc3 4 abc4 Name: id, dtype: object
The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric(). This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
In order to convert data types in pandas, there are three basic options: Use astype() to force an appropriate dtype. Create a custom function to convert the data. Use pandas functions such as to_numeric() or to_datetime()
The dtypes property is used to find the dtypes in the DataFrame. This returns a Series with the data type of each column. The result's index is the original DataFrame's columns. Columns with mixed types are stored with the object dtype.
astype() method is used to cast a pandas object to a specified dtype. astype() function also provides the capability to convert any suitable existing column to categorical type. DataFrame. astype() function comes very handy when we want to case a particular column data type to another data type.
Documenting the answer that worked for me based on the comment by @piRSquared.
I needed to convert to a string first, then an integer.
>>> df['purchase'].astype(str).astype(int)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With