Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Expected behavior of Pandas str.isnumeric()

I have a multi-dtype series pd.Series like [100, 50, 0, foo, bar, baz]

when I run pd.Series.str.isnumeric()

I get [NaN, NaN, NaN, False, False, False]

Why is this happening? Shouldn't it return True for the first three in this series?

like image 504
Andrew Avatar asked Dec 05 '22 11:12

Andrew


1 Answers

Pandas string methods follow Python methods closely:

str.isnumeric(100)    # TypeError
str.isnumeric('100')  # True
str.isnumeric('a10')  # False

Any type which yields an error will give NaN. As per the Python docs, str.isnumeric is only applicable for strings:

str.isnumeric()
Return true if all characters in the string are numeric characters, and there is at least one character, false otherwise.

As per the Pandas docs, pd.Series.str.isnumeric is equivalent to str.isnumeric:

Series.str.isnumeric()
Check whether all characters in each string in the Series/Index are numeric. Equivalent to str.isnumeric().

Your series has "object" dtype, this is an all-encompassing type which holds pointers to arbitrary Python objects. These may be a mixture of strings, integers, etc. Therefore, you should expect NaN values where strings are not found.

To accommodate numeric types, you need to convert to strings explicitly, e.g. given a series s:

s.astype(str).str.isnumeric()
like image 180
jpp Avatar answered Dec 21 '22 22:12

jpp