I have been stuck on this for a while and no amount of googling seems to help.
I am reading in a lot of raw data. Some of the variables come in as objects due to the source using letters for various reasons for missing (which I do not care about).
So I want to run a fairly large subset of columns through pandas.to_numeric(___ ,error='coerce')
just to force these to be cast as int or float (again, I do not care too much which, just that they are numeric.
I can make this happen column by column easy:
df['col_name'] = pd.to_numeric(df['col_name'], errors='coerce')
However, I have some 60 columns I want to cast like this .. so I thought this would work:
numeric = ['lots', 'a', 'columns']
for item in numeric:
df_[item] = pd.to_numeric(df[item], errors='coerce')
The error I get is:
Traceback (most recent call last):
File "/Users/____/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-53-43b873fbd712>", line 2, in <module>
df_detail[item] = pd.to_numeric(dfl[item], errors='coerce')
File "/Users/____/anaconda/lib/python2.7/site-packages/pandas/tools/util.py", line 101, in to_numeric
raise TypeError('arg must be a list, tuple, 1-d array, or Series')
TypeError: arg must be a list, tuple, 1-d array, or Series
I tried many versions. This is has something to do with the list or looking through it. I get the very same error when the for-loop simply calls for df(item).describe()
From my (still novice) understanding of Python, this should work. I am at loss. Thanks
Convert All Columns to Strings If you want to change the data type for all columns in the DataFrame to the string type, you can use df. applymap(str) or df. astype(str) methods.
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
First of all, see this answer
# Let
numeric = ['lots', 'a', 'columns']
Option 1
df[numeric] = df[numeric].apply(pd.to_numeric, errors='coerce')
Option 2
df.loc[:, numeric] = pd.to_numeric(df[numeric].values.ravel(), 'coerce') \
.reshape(-1, len(numeric))
Demonstration
Consider the dataframe df
df = pd.DataFrame([
[1, 'a', 2],
['b', 3, 'c'],
['4', 'd', '5']
], columns=['A', 'B', 'C'])
Then both options above yield
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With