Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying string functions to elements that can be NaN

I have a Pandas DataFrame with categorical data written by humans. Let's say this:

>>> df = pd.DataFrame({'name': ["A", " A", "A ", "b", "B"]})
  name
0    A
1    A
2   A
3    b
4    B

I want to normalize these values by stripping spaces and uppercasing them. This works great:

>>> df.apply(lambda x: x['name'].upper().strip(), axis=1)
0    A
1    A
2    A
3    B
4    B

The issue I'm having is that I also have a few nan values, and I effectively want those to remain as nans after this transformation. But if I have this:

>>> df2 = pd.DataFrame({'name': ["A", " A", "A ", "b", "B", np.nan]})
>>> df2.apply(lambda x: x['name'].upper().strip(), axis=1)
("'float' object has no attribute 'upper'", u'occurred at index 5')

What I'd like is this:

0    A
1    A
2    A
3    B
4    B
5   NaN

I understand why this is happening (nan is a float, while others are strings), but I can't find an elegant way of writing this..

Any thoughts?

like image 976
user1496984 Avatar asked Nov 02 '15 23:11

user1496984


People also ask

How do you fill a string with NaN values?

We can replace the NaN with an empty string using df. replace() function. This function will replace an empty string inplace of the NaN value.

Can a string be NaN?

We can check if a string is NaN by using the property of NaN object that a NaN != NaN. Let us define a boolean function isNaN() which returns true if the given argument is a NaN and returns false otherwise. We can also take a value and convert it to float to check whether it is NaN.

What can I replace NaN with?

By using replace() or fillna() methods you can replace NaN values with Blank/Empty string in Pandas DataFrame. NaN stands for Not A Number and is one of the common ways to represent the missing data value in Python/Pandas DataFrame.

How do I fix NaN in Python?

We can replace NaN values with 0 to get rid of NaN values. This is done by using fillna() function. This function will check the NaN values in the dataframe columns and fill the given value.


1 Answers

You can use the vectorized str operators:

>>> df2.name.str.strip().str.upper()
0      A
1      A
2      A
3      B
4      B
5    NaN
Name: name, dtype: object
like image 67
Alexander Avatar answered Sep 28 '22 01:09

Alexander