Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do NaN values make min and max sensitive to order? [duplicate]

Tags:

python

nan

numpy

> import numpy as np

> min(50, np.NaN)
50   
> min(np.NaN, 50)
nan

(Same behaviour occurs with max)

I know that I can avoid this behaviour by using numpy.nanmin. But what causes the change when the order is reversed? Is min sensitive to input order?

like image 910
Josh Friedlander Avatar asked Jun 29 '20 11:06

Josh Friedlander


People also ask

How do you fix NaN errors in Python?

We can replace NaN values with 0 to get rid of NaN values. This is done by using fillna() function. This function will check the NaN values in the dataframe columns and fill the given value.

How does machine learning deal with NaN?

Because most of the machine learning models that you want to use will provide an error if you pass NaN values into it. The easiest way is to just fill them up with 0, but this can reduce your model accuracy significantly.

Why does NaN float?

NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special floating-point value and cannot be converted to any other type than float. NaN value is one of the major problems in Data Analysis.


2 Answers

Yes nan breaks proper ordering, because it always compares as False. A lot of things with nan are inconsistent:

In [2]: 3.0 < float('nan')
Out[2]: False

In [3]: float('nan') < 3.0
Out[3]: False

In [4]: float('nan') == 3.0
Out[4]: False

min and max can only give you consistent results of you are working with well-defined orderings, which numeric types are not if you can have nan

like image 151
juanpa.arrivillaga Avatar answered Oct 12 '22 09:10

juanpa.arrivillaga


Is min sensitive to input order?

Yes.

https://docs.python.org/3/library/functions.html#min

"If multiple items are minimal, the function returns the first one encountered."

The documentation does not specify exactly how "minimal" is defined in the face of items that don't have a consistent order, but it's likely that min is based on looping over the elements and using the < operator to determine if the new element is smaller than the smallest item found so-far.

To confirm this hypothesis we can read the source code (search for builtin_min and min_max in https://github.com/python/cpython/blob/c96d00e88ead8f99bb6aa1357928ac4545d9287c/Python/bltinmodule.c ), it's slightly confusing because the implementations for min and max are combined and the variable names seem to be based on it being a max function but it's not too hard to follow.

And it does indeed loop through the elements in order and performs the comparison with a call to PyObject_RichCompareBool with an "opid" of Py_LT which is the C API equivalent of the python < operator.

Comparisons between NaN and numbers return false, so in a list containing numbers and NaNs if there is a NaN in the first position it will be considered the minimum as no number will be "less than" it. On the other hand, if the NaN is not in the first position then it will be effectively skipped over as it is not "less than" any number.

like image 43
plugwash Avatar answered Oct 12 '22 09:10

plugwash