Type Conversion in python AttributeError: 'str' object has no attribute 'astype'

I am confused by the type conversion in python pandas

df = pd.DataFrame({'a':['1.23', '0.123']})

Here df is a pandas series and its contents are 2 strings, then I can apply astype(float) on this pandas series, and it correctly convert all string into float. However


gives me AttributeError: 'str' object has no attribute 'astype'. My question is: how can that be? I could convert the whole series from string to float but I couldn't convert the entry of this series from string to float?

Also, I load my raw data set


it generates ValueError: invalid literal for int() with base 10: '' This one seems to suggest that there is a blank in my df['id']. So I check whether it is true by typing

'' in df['id']

it says false. So I am very confused.

2 Answers

df['a'] returns a Series object that has astype as a vectorized way to convert all elements in the series into another one.

df['a'][1] returns the content of one cell of the dataframe, in this case the string '0.123'. This is now returning a str object that doesn't have this function. To convert it use regular python instruction:

Out[25]: str

Out[26]: 0.123

Out[27]: float

As per your second question, the operator in that is at the end calling __contains__ against the series with '' as argument, here is the docstring of the operator:

Help on function __contains__ in module pandas.core.generic:

__contains__(self, key)
    True if the key is in the info axis

It means that the in operator is searching your empty string in the index, not the contents of it.

The way to search your empty strings is to use the equal operator:

0  42

'' in df
Out[55]: False

0  False
1   True

df['a'][1] will return the actual value inside the array, at the position 1, which is in fact a string. You can convert it by using float(df['a'][1]).

>>> df = pd.DataFrame({'a':['1.23', '0.123']})
>>> type(df['a'])
<class 'pandas.core.series.Series'>
>>> df['a'].astype(float)
0    1.230
1    0.123
Name: a, dtype: float64
>>> type(df['a'][1])
<type 'str'>

For the second question, maybe you have an empty value on your raw data. The correct test would be:

>>> df = pd.DataFrame({'a':['1', '']})
>>> '' in df['a'].values

Source for the second question: https://stackoverflow.com/a/21320011/5335508

