Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to test for nan's in an apply function in pandas?

I have a simple apply function that I execute on some of the columns. But, it keeps getting tripped up by NaN values in pandas.

input_data = np.array(
[
[random.randint(0,9) for x in range(2)]+['']+['g'],
[random.randint(0,9) for x in range(3)]+['g'],
[random.randint(0,9) for x in range(3)]+['a'],
[random.randint(0,9) for x in range(3)]+['b'],
[random.randint(0,9) for x in range(3)]+['b']
]
)

input_df = pd.DataFrame(data=input_data, columns=['B', 'C', 'D', 'label'])

I have a simple lambda like this:

input_df['D'].apply(lambda aCode: re.sub('\.', '', aCode) if not np.isnan(aCode) else aCode)

And it gets tripped up by the NaN values:

File "<pyshell#460>", line 1, in <lambda>
    input_df['D'].apply(lambda aCode: re.sub('\.', '', aCode) if not np.isnan(aCode) else aCode)
TypeError: Not implemented for this type

So, I tried just testing for nan values that Pandas adds:

np.isnan(input_df['D'].values[0])
np.isnan(input_df['D'].iloc[0])

Both get the same error.

I do not know how to test for nan values other than np.isnan. Is there an easier way to do this? Thanks.

like image 751
makansij Avatar asked Feb 05 '16 20:02

makansij


People also ask

How do you check if a particular value in a DataFrame is NaN?

To check if value at a specific location in Pandas is NaN or not, call numpy. isnan() function with the value passed as argument. If value equals numpy. nan, the expression returns True, else it returns False.

How do you know if a value is NaN?

The math. isnan() method checks whether a value is NaN (Not a Number), or not. This method returns True if the specified value is a NaN, otherwise it returns False.

How do you check for missing values in pandas?

In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.

How do I check if multiple columns are null in pandas?

By using isnull(). values. any() method you can check if a pandas DataFrame contains NaN / None values in any cell (all rows & columns ). This method returns True if it finds NaN/None on any cell of a DataFrame, returns False when not found.


Video Answer


1 Answers

your code fails because your first entry is an empty string and np.isnan doesn't understand empty strings:

In [55]:
input_df['D'].iloc[0]

Out[55]:
''

In [56]:
np.isnan('')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-56-a9f139a0c5b8> in <module>()
----> 1 np.isnan('')

TypeError: Not implemented for this type

ps.notnull does work:

In [57]:
import re
input_df['D'].apply(lambda aCode: re.sub('\.', '', aCode) if pd.notnull(aCode) else aCode)

Out[57]:
0     
1    3
2    3
3    0
4    3
Name: D, dtype: object

However, if you just want to replace something then just use .str.replace:

In [58]:
input_df['D'].str.replace('\.','')

Out[58]:
0     
1    3
2    3
3    0
4    3
Name: D, dtype: object
like image 188
EdChum Avatar answered Oct 02 '22 23:10

EdChum