Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas convert strings to numeric if possible; else keep string values

I have a Pandas Dataframe which has columns which look something like this:

df:

Column0   Column1     Column2
'MSC'       '1'        'R2'
'MIS'       'Tuesday'  '22'
'13'        'Finance'  'Monday'

So overall, in these columns are actual strings but also numeric values (integers) which are in string format.

I found this nice post about the pd.to_numeric and astype() methods, but I can't see if or how I could use them in my case.

Using:

pd.to_numeric(df, errors = 'ignore')

just results in skiping the whole columns. Instead of skipping the whole columns, I only want to skip the strings in those columns which can't be converted, move on to the next entry and try to convert the next string.

So in the end, my dataframe would look like this:

df:

Column0   Column1     Column2
'MSC'       1          'R2'
'MIS'      'Tuesday'    22
 13        'Finance'  'Monday'

Is there maybe an efficient way to loop over these columns and achieve that?

Best regards, Jan

EDIT: Thanks for all your suggestions! Since I am still a python beginner, @coldspeed and @sacul 's answers are easier to understand for me so I will go with one of them!

like image 246
JanB Avatar asked Dec 04 '18 23:12

JanB


People also ask

How do I convert data to numeric in pandas?

The best way to convert one or more columns of a DataFrame to numeric values is to use pandas.to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.

What does Astype do in pandas?

Pandas DataFrame astype() Method The astype() method returns a new DataFrame where the data types has been changed to the specified type.

What is the pandas Dtype for storing string data?

Pandas uses the object dtype for storing strings.


1 Answers

100% agree with the comments—mixing dtypes in columns is a terrible idea, performance wise.

For reference, however, I would do this with pd.to_numeric and fillna:

df2 = df.apply(pd.to_numeric, errors='coerce').fillna(df)
print(df2)
  Column0  Column1 Column2
0     MSC        1      R2
1     MIS  Tuesday      22
2      13  Finance  Monday

Columns are cast to object dtype to prevent coercion. You can see this when you extract the values:

print(df2.values.tolist())
[['MSC', 1.0, 'R2'], ['MIS', 'Tuesday', 22.0], [13.0, 'Finance', 'Monday']]
like image 139
cs95 Avatar answered Sep 21 '22 20:09

cs95