Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

modifying many columns in pandas dataframe

I have been stuck on this for a while and no amount of googling seems to help.

I am reading in a lot of raw data. Some of the variables come in as objects due to the source using letters for various reasons for missing (which I do not care about).

So I want to run a fairly large subset of columns through pandas.to_numeric(___ ,error='coerce') just to force these to be cast as int or float (again, I do not care too much which, just that they are numeric.

I can make this happen column by column easy:

df['col_name'] = pd.to_numeric(df['col_name'], errors='coerce') 

However, I have some 60 columns I want to cast like this .. so I thought this would work:

numeric = ['lots', 'a', 'columns']
for item in numeric:
    df_[item] = pd.to_numeric(df[item], errors='coerce')

The error I get is:

Traceback (most recent call last):

File "/Users/____/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)

File "<ipython-input-53-43b873fbd712>", line 2, in <module>
df_detail[item] = pd.to_numeric(dfl[item], errors='coerce')

File "/Users/____/anaconda/lib/python2.7/site-packages/pandas/tools/util.py", line 101, in to_numeric
raise TypeError('arg must be a list, tuple, 1-d array, or Series')

TypeError: arg must be a list, tuple, 1-d array, or Series

I tried many versions. This is has something to do with the list or looking through it. I get the very same error when the for-loop simply calls for df(item).describe()

From my (still novice) understanding of Python, this should work. I am at loss. Thanks

like image 670
dozyaustin Avatar asked Oct 01 '16 05:10

dozyaustin


People also ask

How do I convert multiple columns to string in Python?

Convert All Columns to Strings If you want to change the data type for all columns in the DataFrame to the string type, you can use df. applymap(str) or df. astype(str) methods.

How do you change the values of a column in pandas based on multiple condition?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.


1 Answers

First of all, see this answer

# Let
numeric = ['lots', 'a', 'columns']

Option 1

df[numeric] = df[numeric].apply(pd.to_numeric, errors='coerce')

Option 2

df.loc[:, numeric] = pd.to_numeric(df[numeric].values.ravel(), 'coerce') \
                       .reshape(-1, len(numeric))

Demonstration
Consider the dataframe df

df = pd.DataFrame([
        [1, 'a', 2],
        ['b', 3, 'c'],
        ['4', 'd', '5']
    ], columns=['A', 'B', 'C'])

Then both options above yield

enter image description here

like image 129
piRSquared Avatar answered Sep 27 '22 16:09

piRSquared