Pandas, numpy.where(), and numpy.nan

Tags:

I want to use numpy.where() to add a column to a pandas.DataFrame. I'd like to use NaN values for the rows where the condition is false (to indicate that these values are "missing").

Consider:

>>> import numpy; import pandas
>>> df = pandas.DataFrame({'A':[1,2,3,4]}); print(df)
   A
0  1
1  2
2  3
3  4
>>> df['B'] = numpy.nan
>>> df['C'] = numpy.where(df['A'] < 3, 'yes', numpy.nan)
>>> print(df)
   A   B    C
0  1 NaN  yes
1  2 NaN  yes
2  3 NaN  nan
3  4 NaN  nan
>>> df.isna()
       A     B      C
0  False  True  False
1  False  True  False
2  False  True  False
3  False  True  False

Why does B show "NaN" but C shows "nan"? And why does DataFrame.isna() fail to detect the NaN values in C?

Should I use something other than numpy.nan inside where? None and pandas.NA both seem to work and can be detected by DataFrame.isna(), but I'm not sure these are the best choice.

Thank you!

Edit: As per @Tim Roberts and @DYZ, numpy.where returns an array of type string, so the str constructor is called on numpy.NaN. The values in column C are actually strings "nan". The question remains, however: what is the most elegant thing to do here? Should I use None? Or something else?

979

asked May 10 '21 21:05

Duncan MacIntyre

1 Answers

np.where coerces the second and the third parameter to the same datatype. Since the second parameter is a string, the third one is converted to a string, too, by calling function str():

str(numpy.nan)
# 'nan'

As the result, the values in column C are all strings.

You can first fill the NaN rows with None and then convert them to np.nan with fillna():

df['C'] = numpy.where(df['A'] < 3, 'yes', None)
df['C'].fillna(np.nan, inplace=True)

answered Oct 22 '22 15:10

DYZ

Related questions
                            
                                fontawesome icon package illegal to use without pro?
                            
                                Why writing through a pointer to a moved variable has not been decided as UB in Rust?
                            
                                When exactly are function arguments being destructed?
                            
                                Redux toolkit, dispatching thunk type missing
                            
                                How to make PageView with different page size only for the current page
                            
                                How to check that multiplication of two decimal numbers is greater than ULONG_MAX?
                            
                                Flutter Https Unhandled Exception: Invalid argument(s)
                            
                                Object's context ('this') between nested functions at a class [duplicate]
                            
                                Good design for common success / failure / error handling for multiple APIs using Retrofit Android
                            
                                Why does PostgreSQL consider NULL boundaries in range types to be distinct from infinite boundaries?
                            
                                Why doesn't this create a dangling reference?
                            
                                Scale and Center D3-Graphviz Graph

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas, numpy.where(), and numpy.nan

Tags:

python

pandas

dataframe

nan

numpy

Duncan MacIntyre

People also ask

1 Answers

DYZ

Recent Activity

Donate For Us