python: convert numerical data in pandas dataframe to floats in the presence of strings

Question

I've got a pandas dataframe with a column 'cap'. This column mostly consists of floats but has a few strings in it, for instance at index 2.

df =
    cap
0    5.2
1    na
2    2.2
3    7.6
4    7.5
5    3.0
...

I import my data from a csv file like so:

df = DataFrame(pd.read_csv(myfile.file))

Unfortunately, when I do this, the column 'cap' is imported entirely as strings. I would like floats to be identified as floats and strings as strings. Trying to convert this using:

df['cap'] = df['cap'].astype(float)

throws up an error:

could not convert string to float: na

Is there any way to make all the numbers into floats but keep the 'na' as a string?

Andy Hayden · Accepted Answer

Calculations with columns of float64 dtype (rather than object) are much more efficient, so this is usually preferred... it will also allow you to do other calculations. Because of this is recommended to use NaN for missing data (rather than your own placeholder, or None).

Is this really the answer you want?

In [11]: df.sum()  # all strings
Out[11]: 
cap    5.2na2.27.67.53.0
dtype: object

In [12]: df.apply(lambda f: to_number(f[0]), axis=1).sum()  # floats and 'na' strings
TypeError: unsupported operand type(s) for +: 'float' and 'str'

You should use convert_numeric to coerce to floats:

In [21]: df.convert_objects(convert_numeric=True)
Out[21]: 
   cap
0  5.2
1  NaN
2  2.2
3  7.6
4  7.5
5  3.0

Or read it in directly as a csv, by appending 'na' to the list of values to be considered NaN:

In [22]: pd.read_csv(myfile.file, na_values=['na'])
Out[22]: 
   cap
0  5.2
1  NaN
2  2.2
3  7.6
4  7.5
5  3.0

In either case, sum (and many other pandas functions) will now work:

In [23]: df.sum()
Out[23]:
cap    25.5
dtype: float64

As Jeff advises:

repeat 3 times fast: object==bad, float==good

python: convert numerical data in pandas dataframe to floats in the presence of strings

Tags:

python

pandas

dataframe

natsuki_2002

1 Answers

Is this really the answer you want?

repeat 3 times fast: object==bad, float==good

Andy Hayden

Recent Activity

Donate For Us

python: convert numerical data in pandas dataframe to floats in the presence of strings

Tags:

python

pandas

dataframe

natsuki_2002

1 Answers

Is this really the answer you want?

repeat 3 times fast: object==bad, float==good

Andy Hayden

Related questions

Recent Activity

Donate For Us