I have a pandas DataFrame object named <code>xiv</code> which has a column of <code>int64</code> Volume measurements. <pre class="prettyprint"><code>In[]: xiv['Volume'].head(5) Out[]: 0 252000 1 484000 2 62000 3 168000 4 232000 Name: Volume, dtype: int64 </code></pre> I have read other posts (like this and this) that suggest the following solutions. But when I use either approach, it doesn't appear to change the <code>dtype</code> of the underlying data: <pre class="prettyprint"><code>In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume']) In[]: xiv['Volume'].dtypes Out[]: dtype('int64') </code></pre> Or... <pre class="prettyprint"><code>In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume']) Out[]: ###omitted for brevity### In[]: xiv['Volume'].dtypes Out[]: dtype('int64') In[]: xiv['Volume'] = xiv['Volume'].apply(pd.to_numeric) In[]: xiv['Volume'].dtypes Out[]: dtype('int64') </code></pre> I've also tried making a separate pandas <code>Series</code> and using the methods listed above on that Series and reassigning to the <code>x['Volume']</code> obect, which is a <code>pandas.core.series.Series</code> object. I have, however, found a solution to this problem using the <code>numpy</code> package's <code>float64</code> type - this works but I don't know why it's different. <pre class="prettyprint"><code>In[]: xiv['Volume'] = xiv['Volume'].astype(np.float64) In[]: xiv['Volume'].dtypes Out[]: dtype('float64') </code></pre> Can someone explain how to accomplish with the <code>pandas</code> library what the <code>numpy</code> library seems to do easily with its <code>float64</code> class; that is, convert the column in the <code>xiv</code> DataFrame to a <code>float64</code> in place.

If you already have numeric dtypes (<code>int8|16|32|64</code>,<code>float64</code>,<code>boolean</code>) you can convert it to another "numeric" dtype using Pandas .astype() method. Demo: <pre class="prettyprint"><code>In [90]: df = pd.DataFrame(np.random.randint(10**5,10**7,(5,3)),columns=list('abc'), dtype=np.int64) In [91]: df Out[91]: a b c 0 9059440 9590567 2076918 1 5861102 4566089 1947323 2 6636568 162770 2487991 3 6794572 5236903 5628779 4 470121 4044395 4546794 In [92]: df.dtypes Out[92]: a int64 b int64 c int64 dtype: object In [93]: df['a'] = df['a'].astype(float) In [94]: df.dtypes Out[94]: a float64 b int64 c int64 dtype: object </code></pre> It won't work for <code>object</code> (string) dtypes, that can't be converted to numbers: <pre class="prettyprint"><code>In [95]: df.loc[1, 'b'] = 'XXXXXX' In [96]: df Out[96]: a b c 0 9059440.0 9590567 2076918 1 5861102.0 XXXXXX 1947323 2 6636568.0 162770 2487991 3 6794572.0 5236903 5628779 4 470121.0 4044395 4546794 In [97]: df.dtypes Out[97]: a float64 b object c int64 dtype: object In [98]: df['b'].astype(float) ... skipped ... ValueError: could not convert string to float: 'XXXXXX' </code></pre> So here we want to use pd.to_numeric() method: <pre class="prettyprint"><code>In [99]: df['b'] = pd.to_numeric(df['b'], errors='coerce') In [100]: df Out[100]: a b c 0 9059440.0 9590567.0 2076918 1 5861102.0 NaN 1947323 2 6636568.0 162770.0 2487991 3 6794572.0 5236903.0 5628779 4 470121.0 4044395.0 4546794 In [101]: df.dtypes Out[101]: a float64 b float64 c int64 dtype: object </code></pre>

When to apply(pd.to_numeric) and when to astype(np.float64) in python?

Tags:

python

types

pandas

dataframe

numpy

I have a pandas DataFrame object named xiv which has a column of int64 Volume measurements.

In[]: xiv['Volume'].head(5) Out[]:   0    252000 1    484000 2     62000 3    168000 4    232000 Name: Volume, dtype: int64

I have read other posts (like this and this) that suggest the following solutions. But when I use either approach, it doesn't appear to change the dtype of the underlying data:

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume'])  In[]: xiv['Volume'].dtypes Out[]:  dtype('int64')

Or...

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume']) Out[]: ###omitted for brevity###  In[]: xiv['Volume'].dtypes Out[]:  dtype('int64')  In[]: xiv['Volume'] = xiv['Volume'].apply(pd.to_numeric)  In[]: xiv['Volume'].dtypes Out[]:  dtype('int64')

I've also tried making a separate pandas Series and using the methods listed above on that Series and reassigning to the x['Volume'] obect, which is a pandas.core.series.Series object.

I have, however, found a solution to this problem using the numpy package's float64 type - this works but I don't know why it's different.

In[]: xiv['Volume'] = xiv['Volume'].astype(np.float64)  In[]: xiv['Volume'].dtypes Out[]:  dtype('float64')

Can someone explain how to accomplish with the pandas library what the numpy library seems to do easily with its float64 class; that is, convert the column in the xiv DataFrame to a float64 in place.

331

asked Oct 17 '16 21:10

d8aninja

Video Answer

2 Answers

If you already have numeric dtypes (int8|16|32|64,float64,boolean) you can convert it to another "numeric" dtype using Pandas .astype() method.

Demo:

In [90]: df = pd.DataFrame(np.random.randint(10**5,10**7,(5,3)),columns=list('abc'), dtype=np.int64)  In [91]: df Out[91]:          a        b        c 0  9059440  9590567  2076918 1  5861102  4566089  1947323 2  6636568   162770  2487991 3  6794572  5236903  5628779 4   470121  4044395  4546794  In [92]: df.dtypes Out[92]: a    int64 b    int64 c    int64 dtype: object  In [93]: df['a'] = df['a'].astype(float)  In [94]: df.dtypes Out[94]: a    float64 b      int64 c      int64 dtype: object

It won't work for object (string) dtypes, that can't be converted to numbers:

In [95]: df.loc[1, 'b'] = 'XXXXXX'  In [96]: df Out[96]:            a        b        c 0  9059440.0  9590567  2076918 1  5861102.0   XXXXXX  1947323 2  6636568.0   162770  2487991 3  6794572.0  5236903  5628779 4   470121.0  4044395  4546794  In [97]: df.dtypes Out[97]: a    float64 b     object c      int64 dtype: object  In [98]: df['b'].astype(float) ... skipped ... ValueError: could not convert string to float: 'XXXXXX'

So here we want to use pd.to_numeric() method:

In [99]: df['b'] = pd.to_numeric(df['b'], errors='coerce')  In [100]: df Out[100]:            a          b        c 0  9059440.0  9590567.0  2076918 1  5861102.0        NaN  1947323 2  6636568.0   162770.0  2487991 3  6794572.0  5236903.0  5628779 4   470121.0  4044395.0  4546794  In [101]: df.dtypes Out[101]: a    float64 b    float64 c      int64 dtype: object

184

answered Sep 24 '22 13:09

MaxU - stop WAR against UA

I don't have a technical explanation for this but, I have noticed that pd.to_numeric() raises the following error when converting the string 'nan':

In [10]: df = pd.DataFrame({'value': 'nan'}, index=[0])  In [11]: pd.to_numeric(df.value)  Traceback (most recent call last):    File "<ipython-input-11-98729d13e45c>", line 1, in <module>     pd.to_numeric(df.value)    File "C:\Users\joshua.lee\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\tools\numeric.py", line 133, in to_numeric     coerce_numeric=coerce_numeric)    File "pandas/_libs/src\inference.pyx", line 1185, in pandas._libs.lib.maybe_convert_numeric  ValueError: Unable to parse string "nan" at position 0

whereas astype(float) does not:

df.value.astype(float) Out[12]:  0   NaN Name: value, dtype: float64

answered Sep 25 '22 13:09

reevesnmortimer

Related questions
                            
                                Prevent Python packages from re-exporting imported names
                            
                                Which is the best IDE for Python For Windows [duplicate]
                            
                                Why is `continue` not allowed in a `finally` clause in Python?
                            
                                Are sets ordered like dicts in python3.6
                            
                                Passing variables to a subprocess call [duplicate]
                            
                                How to predict time series in scikit-learn?
                            
                                Wrap an open stream with io.TextIOWrapper
                            
                                matplotlib 3d axes ticks, labels, and LaTeX
                            
                                Using !s vs. :s to format a string in Python
                            
                                How to setup entry_points in setup.cfg
                            
                                Python3 Error: TypeError: Can't convert 'bytes' object to str implicitly
                            
                                ZODB In Real Life [closed]
                            
                                SQLAlchemy - what is declarative_base
                            
                                Is 'input' a keyword in Python?
                            
                                Cython Numpy warning about NPY_NO_DEPRECATED_API when using MemoryView
                            
                                Continuous unit testing with Pydev (Python and Eclipse)
                            
                                How do I convert a Python program to a runnable .exe Windows program? [duplicate]
                            
                                python defaultdict: 0 vs. int and [] vs list
                            
                                How to match a new line character in Python raw string
                            
                                TensorFlow ValueError: Cannot feed value of shape (64, 64, 3) for Tensor u'Placeholder:0', which has shape '(?, 64, 64, 3)'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With