pandas: to_numeric for multiple columns

Tags:

python

pandas

I'm working with the following df:

c.sort_values('2005', ascending=False).head(3)       GeoName ComponentName     IndustryId IndustryClassification Description                                2004 2005  2006  2007  2008  2009 2010 2011 2012 2013 2014 37926 Alabama Real GDP by state 9          213                    Support activities for mining              99   98    117   117   115   87   96   95   103  102  (NA) 37951 Alabama Real GDP by state 34         42                     Wholesale trade                            9898 10613 10952 11034 11075 9722 9765 9703 9600 9884 10199 37932 Alabama Real GDP by state 15         327                    Nonmetallic mineral products manufacturing 980  968   940   1084  861   724  714  701  589  641  (NA)

I want to force numeric on all of the years:

c['2014'] = pd.to_numeric(c['2014'], errors='coerce')

is there an easy way to do this or do I have to type them all out?

802

asked Apr 23 '16 17:04

Collective Action

2 Answers

UPDATE: you don't need to convert your values afterwards, you can do it on-the-fly when reading your CSV:

In [165]: df=pd.read_csv(url, index_col=0, na_values=['(NA)']).fillna(0)  In [166]: df.dtypes Out[166]: GeoName                    object ComponentName              object IndustryId                  int64 IndustryClassification     object Description                object 2004                        int64 2005                        int64 2006                        int64 2007                        int64 2008                        int64 2009                        int64 2010                        int64 2011                        int64 2012                        int64 2013                        int64 2014                      float64 dtype: object

If you need to convert multiple columns to numeric dtypes - use the following technique:

Sample source DF:

In [271]: df Out[271]:      id    a  b  c  d  e    f 0  id_3  AAA  6  3  5  8    1 1  id_9    3  7  5  7  3  BBB 2  id_7    4  2  3  5  4    2 3  id_0    7  3  5  7  9    4 4  id_0    2  4  6  4  0    2  In [272]: df.dtypes Out[272]: id    object a     object b      int64 c      int64 d      int64 e      int64 f     object dtype: object

Converting selected columns to numeric dtypes:

In [273]: cols = df.columns.drop('id')  In [274]: df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')  In [275]: df Out[275]:      id    a  b  c  d  e    f 0  id_3  NaN  6  3  5  8  1.0 1  id_9  3.0  7  5  7  3  NaN 2  id_7  4.0  2  3  5  4  2.0 3  id_0  7.0  3  5  7  9  4.0 4  id_0  2.0  4  6  4  0  2.0  In [276]: df.dtypes Out[276]: id     object a     float64 b       int64 c       int64 d       int64 e       int64 f     float64 dtype: object

PS if you want to select all string (object) columns use the following simple trick:

cols = df.columns[df.dtypes.eq('object')]

199

answered Sep 22 '22 00:09

MaxU - stop WAR against UA

another way is using apply, one liner:

cols = ['col1', 'col2', 'col3'] data[cols] = data[cols].apply(pd.to_numeric, errors='coerce', axis=1)

answered Sep 21 '22 00:09

muon

Related questions
                            
                                ModuleNotFoundError: No module named 'sklearn'
                            
                                How do I set up Setuptools for Python 2.6 on Windows?
                            
                                How to find overlapping matches with a regexp?
                            
                                How should I understand the output of dis.dis?
                            
                                How to assign a value to a TensorFlow variable?
                            
                                fit_transform() takes 2 positional arguments but 3 were given with LabelBinarizer
                            
                                Byte Array to Hex String
                            
                                How can I build a model to distinguish tweets about Apple (Inc.) from tweets about apple (fruit)?
                            
                                any() function in Python with a callback
                            
                                NameError: name 'reload' is not defined
                            
                                Django: Catching Integrity Error and showing a customized message using template
                            
                                Redefining the Index in a Pandas DataFrame object
                            
                                Is it possible exclude test directories from coverage.py reports?
                            
                                Dynamically changing log level without restarting the application
                            
                                pandas out of bounds nanosecond timestamp after offset rollforward plus adding a month offset
                            
                                Python: How to pip install opencv2 with specific version 2.4.9?
                            
                                split a multi-page pdf file into multiple pdf files with python?
                            
                                Move legend outside figure in seaborn tsplot
                            
                                How to parse dates with -0400 timezone string in Python?
                            
                                How to remove extra indentation of Python triple quoted multi-line strings?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With