Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: to_numeric for multiple columns

Tags:

python

pandas

I'm working with the following df:

c.sort_values('2005', ascending=False).head(3)       GeoName ComponentName     IndustryId IndustryClassification Description                                2004 2005  2006  2007  2008  2009 2010 2011 2012 2013 2014 37926 Alabama Real GDP by state 9          213                    Support activities for mining              99   98    117   117   115   87   96   95   103  102  (NA) 37951 Alabama Real GDP by state 34         42                     Wholesale trade                            9898 10613 10952 11034 11075 9722 9765 9703 9600 9884 10199 37932 Alabama Real GDP by state 15         327                    Nonmetallic mineral products manufacturing 980  968   940   1084  861   724  714  701  589  641  (NA) 

I want to force numeric on all of the years:

c['2014'] = pd.to_numeric(c['2014'], errors='coerce') 

is there an easy way to do this or do I have to type them all out?

like image 802
Collective Action Avatar asked Apr 23 '16 17:04

Collective Action


People also ask

How do I convert multiple columns to float?

To convert the data type of multiple columns to float, use Pandas' apply(~) method with to_numeric(~) .

How do I convert multiple columns in pandas?

You can use the DataFrame. apply() and pd. to_datetime() function to convert multiple columns to DataTime. apply() function applies a function to each and every row and column of the DataFrame.

Can you use Groupby with multiple columns in pandas?

Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.


2 Answers

UPDATE: you don't need to convert your values afterwards, you can do it on-the-fly when reading your CSV:

In [165]: df=pd.read_csv(url, index_col=0, na_values=['(NA)']).fillna(0)  In [166]: df.dtypes Out[166]: GeoName                    object ComponentName              object IndustryId                  int64 IndustryClassification     object Description                object 2004                        int64 2005                        int64 2006                        int64 2007                        int64 2008                        int64 2009                        int64 2010                        int64 2011                        int64 2012                        int64 2013                        int64 2014                      float64 dtype: object 

If you need to convert multiple columns to numeric dtypes - use the following technique:

Sample source DF:

In [271]: df Out[271]:      id    a  b  c  d  e    f 0  id_3  AAA  6  3  5  8    1 1  id_9    3  7  5  7  3  BBB 2  id_7    4  2  3  5  4    2 3  id_0    7  3  5  7  9    4 4  id_0    2  4  6  4  0    2  In [272]: df.dtypes Out[272]: id    object a     object b      int64 c      int64 d      int64 e      int64 f     object dtype: object 

Converting selected columns to numeric dtypes:

In [273]: cols = df.columns.drop('id')  In [274]: df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')  In [275]: df Out[275]:      id    a  b  c  d  e    f 0  id_3  NaN  6  3  5  8  1.0 1  id_9  3.0  7  5  7  3  NaN 2  id_7  4.0  2  3  5  4  2.0 3  id_0  7.0  3  5  7  9  4.0 4  id_0  2.0  4  6  4  0  2.0  In [276]: df.dtypes Out[276]: id     object a     float64 b       int64 c       int64 d       int64 e       int64 f     float64 dtype: object 

PS if you want to select all string (object) columns use the following simple trick:

cols = df.columns[df.dtypes.eq('object')] 
like image 199
MaxU - stop WAR against UA Avatar answered Sep 22 '22 00:09

MaxU - stop WAR against UA


another way is using apply, one liner:

cols = ['col1', 'col2', 'col3'] data[cols] = data[cols].apply(pd.to_numeric, errors='coerce', axis=1) 
like image 44
muon Avatar answered Sep 21 '22 00:09

muon