Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to apply(pd.to_numeric) and when to astype(np.float64) in python?

I have a pandas DataFrame object named xiv which has a column of int64 Volume measurements.

In[]: xiv['Volume'].head(5) Out[]:   0    252000 1    484000 2     62000 3    168000 4    232000 Name: Volume, dtype: int64 

I have read other posts (like this and this) that suggest the following solutions. But when I use either approach, it doesn't appear to change the dtype of the underlying data:

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume'])  In[]: xiv['Volume'].dtypes Out[]:  dtype('int64') 

Or...

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume']) Out[]: ###omitted for brevity###  In[]: xiv['Volume'].dtypes Out[]:  dtype('int64')  In[]: xiv['Volume'] = xiv['Volume'].apply(pd.to_numeric)  In[]: xiv['Volume'].dtypes Out[]:  dtype('int64') 

I've also tried making a separate pandas Series and using the methods listed above on that Series and reassigning to the x['Volume'] obect, which is a pandas.core.series.Series object.

I have, however, found a solution to this problem using the numpy package's float64 type - this works but I don't know why it's different.

In[]: xiv['Volume'] = xiv['Volume'].astype(np.float64)  In[]: xiv['Volume'].dtypes Out[]:  dtype('float64')  

Can someone explain how to accomplish with the pandas library what the numpy library seems to do easily with its float64 class; that is, convert the column in the xiv DataFrame to a float64 in place.

like image 331
d8aninja Avatar asked Oct 17 '16 21:10

d8aninja


People also ask

What does PD To_numeric do in Python?

Pandas to_numeric() is an inbuilt function that used to convert an argument to a numeric type. The default return type of the function is float64 or int64 depending on the input provided.

What is the use of Astype in Python?

The astype() method returns a new DataFrame where the data types has been changed to the specified type.

What does Astype str do?

astype() is used to do such data type conversions. copy: Makes a copy of dataframe/series. errors: Error raising on conversion to invalid data type. For example dict to string.


Video Answer


2 Answers

If you already have numeric dtypes (int8|16|32|64,float64,boolean) you can convert it to another "numeric" dtype using Pandas .astype() method.

Demo:

In [90]: df = pd.DataFrame(np.random.randint(10**5,10**7,(5,3)),columns=list('abc'), dtype=np.int64)  In [91]: df Out[91]:          a        b        c 0  9059440  9590567  2076918 1  5861102  4566089  1947323 2  6636568   162770  2487991 3  6794572  5236903  5628779 4   470121  4044395  4546794  In [92]: df.dtypes Out[92]: a    int64 b    int64 c    int64 dtype: object  In [93]: df['a'] = df['a'].astype(float)  In [94]: df.dtypes Out[94]: a    float64 b      int64 c      int64 dtype: object 

It won't work for object (string) dtypes, that can't be converted to numbers:

In [95]: df.loc[1, 'b'] = 'XXXXXX'  In [96]: df Out[96]:            a        b        c 0  9059440.0  9590567  2076918 1  5861102.0   XXXXXX  1947323 2  6636568.0   162770  2487991 3  6794572.0  5236903  5628779 4   470121.0  4044395  4546794  In [97]: df.dtypes Out[97]: a    float64 b     object c      int64 dtype: object  In [98]: df['b'].astype(float) ... skipped ... ValueError: could not convert string to float: 'XXXXXX' 

So here we want to use pd.to_numeric() method:

In [99]: df['b'] = pd.to_numeric(df['b'], errors='coerce')  In [100]: df Out[100]:            a          b        c 0  9059440.0  9590567.0  2076918 1  5861102.0        NaN  1947323 2  6636568.0   162770.0  2487991 3  6794572.0  5236903.0  5628779 4   470121.0  4044395.0  4546794  In [101]: df.dtypes Out[101]: a    float64 b    float64 c      int64 dtype: object 
like image 184
MaxU - stop WAR against UA Avatar answered Sep 24 '22 13:09

MaxU - stop WAR against UA


I don't have a technical explanation for this but, I have noticed that pd.to_numeric() raises the following error when converting the string 'nan':

In [10]: df = pd.DataFrame({'value': 'nan'}, index=[0])  In [11]: pd.to_numeric(df.value)  Traceback (most recent call last):    File "<ipython-input-11-98729d13e45c>", line 1, in <module>     pd.to_numeric(df.value)    File "C:\Users\joshua.lee\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\tools\numeric.py", line 133, in to_numeric     coerce_numeric=coerce_numeric)    File "pandas/_libs/src\inference.pyx", line 1185, in pandas._libs.lib.maybe_convert_numeric  ValueError: Unable to parse string "nan" at position 0  

whereas astype(float) does not:

df.value.astype(float) Out[12]:  0   NaN Name: value, dtype: float64 
like image 21
reevesnmortimer Avatar answered Sep 25 '22 13:09

reevesnmortimer