Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does pandas Series.str convert numbers to NaN?

Tags:

python

pandas

This might be a fundamental misunderstanding on my part, but I would expect pandas.Series.str to convert the pandas.Series values into strings.

However, when I do the following, numeric values in the series are converted to np.nan:

df = pd.DataFrame({'a': ['foo    ', 'bar', 42]})
df = df.apply(lambda x: x.str.strip() if x.dtype == 'object' else x)
print(df)

Out:
     a
0  foo
1  bar
2  NaN

If I apply the str function to each column first, numeric values are converted to strings instead of np.nan:

df = pd.DataFrame({'a': ['foo    ', 'bar', 42]})
df = df.apply(lambda x: x.apply(str) if x.dtype == 'object' else x)
df = df.apply(lambda x: x.str.strip() if x.dtype == 'object' else x)
print(df)

Out:
     a
0  foo
1  bar
2   42

The documentation is fairly scant on this topic. What am I missing?

like image 569
Evan Avatar asked Feb 25 '18 20:02

Evan


People also ask

Why am I getting NaN in pandas?

NaN means missing data Missing data is labelled NaN. Note that np. nan is not equal to Python None.

Is NaN in pandas series?

The official documentation for pandas defines what most developers would know as null values as missing or missing data in pandas. Within pandas, a missing value is denoted by NaN .

How does pandas mean deal with NaN?

pandas mean() Key PointsBy default ignore NaN values and performs mean on index axis. Provides a way to calculate mean on column axis.


1 Answers

In this line:

df.apply(lambda x: x.str.strip() if x.dtype == 'object' else x)

The x.dtype is looking at the entire Series (column). The column is not numeric. Thus the entire column is operated on like strings.

In your second example, the number is not preserved, it is a string '42'.

The difference in the output will be due to the difference in panda's str and python's str.

In the case of pandas .str, this is not a conversion, it is an accessor, that allows you to do the .strip() to each element. What this means is that you attempt to apply .strip() to an integer. This throws an exception, and pandas responds to the exception by returning Nan.

In the case of .apply(str), you are actually converting the values to a string. Later when you apply .strip() this succeeds, since the value is already a string, and thus can be stripped.

like image 135
Stephen Rauch Avatar answered Sep 28 '22 08:09

Stephen Rauch