I just ran into a nasty bug as a result of the following behavior:
scala> List(1.0, 2.0, 3.0, Double.NaN).min
res1: Double = NaN
scala> List(1.0, 2.0, 3.0, Double.NaN).max
res2: Double = NaN
I understand that for a pairwise comparison it may sometimes be preferable to have max(NaN, 0) = NaN
and this is probably the reason why java.lang.Double.compare
follows this convention (there seems to be an IEEE standard for that). For a collection however, I really think that this is a strange convention. After all the above collection does contain valid numbers; and these numbers have a clear maximum and minimum. In my opinion, the conception that the maximum number of the collection is not a number is a contradiction since, well, NaN is not a number, so it cannot be the maximum or minimum "number" of a collection -- unless there are no valid numbers at all; in this case it makes perfect sense that the maximum "is not a number". Semantically the min
and max
functions degenerate to a check whether the collection contains a NaN. Since there are more appropriate ways to check for the existence of NaNs (e.g. collection.find(_.isNaN)
) it would be great to maintain a semantically meaningful min/max on collections.
So my question is: What is the best approach to obtain the behavior to ignore the existence of NaNs? I see two possibilities:
Filtering NaNs before calling min/max. Since this requires explicit handling of the issue in all places and may incur performance penalties I would prefer something easier.
It would be great to have a kind of NaN-ignoring ordering which can be used as an implicit ordering wherever necessary. I tried the following:
object NanAwareOrdering extends Ordering[Double] {
def compare(x: Double, y: Double) = {
if (x.isNaN()) {
+1 // without checking x, return y < x
} else if (y.isNaN()) {
-1 // without checking y, return x < y
} else {
java.lang.Double.compare(x, y)
}
}
}
However, this approach seems to depend on whether I'm interesting in finding the min or max, i.e.:
scala> List(1.0, 2.0, 3.0, Double.NaN).min(NanAwareOrdering)
res7: Double = 1.0
scala> List(1.0, 2.0, 3.0, Double.NaN).max(NanAwareOrdering)
res8: Double = NaN
This means that I would have to have two NanAwareOrdering depending on whether I want the minimum or the maximum, which would forbid to have an implicit val
. Therefore my question is: How can I define an ordering in a way to handle both cases at once?
Update:
For the sake of completeness: In the course of analyzing the issue I realized that the premise "degenerates to a check for NaN" is actually wrong. In fact, I think it is even more ugly:
scala> List(1.0, Double.NaN).min
res1: Double = NaN
scala> List(Double.NaN, 1.0).min
res2: Double = 1.0
Disclaimer: I'll add my own answer to the question just in case anyone else is still interested in more details on the matter.
I looks like this issue is more complex than I expected. As Alexey Romanov has already pointed out, the notion of incomparability would require the max/min functions to take a partial order. Unfortunately, Alexey is also right in the point, that a general max/min function based on a partial order does not make sense: Think of the case where the partial ordering only defines relations within certain groups, but the groups themselves are completely independent from each other (for instance, the elements {a, b, c, d} with just the two relations a < b and c < d; we would have two max/min). In that regard one may even argue that formally max/min should always return two values, NaN and the respective valid min/max since NaN itself is also an extremal value in its own relation group.
So as a result of partial orders being too general/complex, the min/max functions take an Ordering
. Unfortunately a total order does not allow the notion of incomparability. Reviewing the three defining properties of a total order makes it pretty obvious that "ignoring NaNs" is formally impossible:
So when trying to come up with an implementation of an Ordering
to fulfill our desired min/max behavior it is clear that we have to violate something (and bear the consequences). The implementation of min
/max
/minBy
/maxBy
in TraversableOnce
follows the pattern (for min
):
reduceLeft((x, y) => if (cmp.lteq(x, y)) x else y)
and gteq
for the max
variants. This gave me the idea of "left biasing" the comparison, i.e.:
x <comparison_operator> NaN is always true to keep x in the reduction
NaN <comparison_operator> x is always false to inject x into the reduction
The resulting implementation of such a "left biased" ordering would look like this:
object BiasedOrdering extends Ordering[Double] {
def compare(x: Double, y: Double) = java.lang.Double.compare(x, y) // this is inconsistent, but the same goes for Double.Ordering
override def lteq(x: Double, y: Double): Boolean = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true else compare(x, y) <= 0
override def gteq(x: Double, y: Double): Boolean = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true else compare(x, y) >= 0
override def lt(x: Double, y: Double): Boolean = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) false else compare(x, y) < 0
override def gt(x: Double, y: Double): Boolean = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) false else compare(x, y) > 0
override def equiv(x: Double, y: Double): Boolean = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true else compare(x, y) == 0
}
Currently I'm trying to find out:
I'm comparing this to Scala's default order Ordering.Double
and the following ordering which is directly derived from java.lang.Double.compare
:
object OrderingDerivedFromCompare extends Ordering[Double] {
def compare(x: Double, y: Double) = {
java.lang.Double.compare(x, y)
}
}
One interesting property of Scala's default order Ordering.Double
is that it overwrites all comparison member functions by the language's native numerical comparison operators (<
, <=
, ==
, >=
, >
) so the comparison results are identical as if we would compare directly with these operators. The following shows all possible relations between a NaN and a valid number for the three orderings:
Ordering.Double 0.0 > NaN = false
Ordering.Double 0.0 >= NaN = false
Ordering.Double 0.0 == NaN = false
Ordering.Double 0.0 <= NaN = false
Ordering.Double 0.0 < NaN = false
OrderingDerivedFromCompare 0.0 > NaN = false
OrderingDerivedFromCompare 0.0 >= NaN = false
OrderingDerivedFromCompare 0.0 == NaN = false
OrderingDerivedFromCompare 0.0 <= NaN = true
OrderingDerivedFromCompare 0.0 < NaN = true
BiasedOrdering 0.0 > NaN = true
BiasedOrdering 0.0 >= NaN = true
BiasedOrdering 0.0 == NaN = true
BiasedOrdering 0.0 <= NaN = true
BiasedOrdering 0.0 < NaN = true
Ordering.Double NaN > 0.0 = false
Ordering.Double NaN >= 0.0 = false
Ordering.Double NaN == 0.0 = false
Ordering.Double NaN <= 0.0 = false
Ordering.Double NaN < 0.0 = false
OrderingDerivedFromCompare NaN > 0.0 = true
OrderingDerivedFromCompare NaN >= 0.0 = true
OrderingDerivedFromCompare NaN == 0.0 = false
OrderingDerivedFromCompare NaN <= 0.0 = false
OrderingDerivedFromCompare NaN < 0.0 = false
BiasedOrdering NaN > 0.0 = false
BiasedOrdering NaN >= 0.0 = false
BiasedOrdering NaN == 0.0 = false
BiasedOrdering NaN <= 0.0 = false
BiasedOrdering NaN < 0.0 = false
Ordering.Double NaN > NaN = false
Ordering.Double NaN >= NaN = false
Ordering.Double NaN == NaN = false
Ordering.Double NaN <= NaN = false
Ordering.Double NaN < NaN = false
OrderingDerivedFromCompare NaN > NaN = false
OrderingDerivedFromCompare NaN >= NaN = true
OrderingDerivedFromCompare NaN == NaN = true
OrderingDerivedFromCompare NaN <= NaN = true
OrderingDerivedFromCompare NaN < NaN = false
BiasedOrdering NaN > NaN = false
BiasedOrdering NaN >= NaN = true
BiasedOrdering NaN == NaN = true
BiasedOrdering NaN <= NaN = true
BiasedOrdering NaN < NaN = false
We can see that:
OrderingDerivedFromCompare
fulfills the total order properties. Based on this result the reasoning behind java.lang.Double.compare
becomes much more clear: Placing NaN at the upper end of the total order simply avoids any contradiction!false
, while for the biased order it depends on the position. Since both lead to contradictions it is difficult to see which may lead to more severe issues.Now to our actual problem at hand, the min/max functions. For OrderingDerivedFromCompare
it is now clear what we have to obtain -- NaN is simply the largest value, so it's clear to obtain it as max, irrespective of how the elements in the list are arranged:
OrderingDerivedFromCompare List(1.0, 2.0, 3.0, Double.NaN).min = 1.0
OrderingDerivedFromCompare List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
OrderingDerivedFromCompare List(1.0, 2.0, 3.0, Double.NaN).max = NaN
OrderingDerivedFromCompare List(Double.NaN, 1.0, 2.0, 3.0).max = NaN
Now to Scala's default ordering. I was deeply shocked to see that the situation is actually even more intricate than mentioned in my question:
Ordering.Double List(1.0, 2.0, 3.0, Double.NaN).min = NaN
Ordering.Double List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
Ordering.Double List(1.0, 2.0, 3.0, Double.NaN).max = NaN
Ordering.Double List(Double.NaN, 1.0, 2.0, 3.0).max = 3.0
In fact the order of the elements becomes relevant (as a result of returning false
for every comparison in the reduceLeft
). "Left biasing" obviously solves this issue, leading to consistent results:
BiasedOrdering List(1.0, 2.0, 3.0, Double.NaN).min = 1.0
BiasedOrdering List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
BiasedOrdering List(1.0, 2.0, 3.0, Double.NaN).max = 3.0
BiasedOrdering List(Double.NaN, 1.0, 2.0, 3.0).max = 3.0
Unfortunately, I'm still not able to fully answer all questions here. Some remaining points are:
Why is the Scala's default ordering defined the way it is? Currently handling of NaNs seems to be pretty flawed. A very dangerous detail of the Ordering.Double
is that the compare
function actually delegates to java.lang.Double.compare
, while the comparison member are implemented based on the language's native comparisons. This obviously leads to inconsistent results, for instance:
Ordering.Double.compare(0.0, Double.NaN) == -1 // indicating 0.0 < NaN
Ordering.Double.lt (0.0, Double.NaN) == false // contradiction
What are potential drawbacks of the BiasedOrdering
, apart from directly evaluating any contradicting comparison? A quick check on sorted
gave the following results, which did not reveal any trouble:
Ordering.Double List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
OrderingDerivedFromCompare List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
BiasedOrdering List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
Ordering.Double List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
OrderingDerivedFromCompare List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
BiasedOrdering List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
For the time being I'll have a go with this left biased ordering. But since the nature of the problem does not allow a flawless general solution: use with care!
Update
And in terms of solutions based on an implicit class as monkjack suggested, I like the following a lot (since it does not mess with (flawed?) total orders at all, but internally converts to a clean totally ordered domain):
implicit class MinMaxNanAware(t: TraversableOnce[Double]) {
def nanAwareMin = t.minBy(x => if (x.isNaN) Double.PositiveInfinity else x)
def nanAwareMax = t.maxBy(x => if (x.isNaN) Double.NegativeInfinity else x)
}
// and now we can simply use
val goodMin = list.nanAwareMin
What about bringing an implicit into scope that would allow you to have new min/max methods on the list.
Something like:
object NanAwareMinOrdering extends Ordering[Double] {
def compare(x: Double, y: Double) = {
if (x.isNaN()) {
+1 // without checking x, return y < x
} else if (y.isNaN()) {
-1 // without checking y, return x < y
} else {
java.lang.Double.compare(x, y)
}
}
}
object NanAwareMaxOrdering extends Ordering[Double] {
....
}
implicit class MinMaxList(list:List[Double]) {
def min2 = list.min(NanAwareMinOrdering)
def max2 = list.max(NanAwareMaxOrdering)
}
List(1.0, 2.0, 3.0, Double.NaN).min2
For
val a = List(1.0, 2.0, 3.0, Double.NaN)
sort it,
a.sortWith {_ >_ }
res: List[Double] = List(3.0, 2.0, 1.0, NaN)
and so NaN
values are relegated, thus for max,
a.sortWith {_ >_ }.head
res: Double = 3.0
Likewise
a.sortWith {_ < _ }
res: List[Double] = List(1.0, 2.0, 3.0, NaN)
and so for min,
a.sortWith {_ < _ }.head
res: Double = 1.0
This answer is simply for explaining the issue, @monkjack's answer probably provides the best practical solution.
Now since Scala offers the possibility to implicitly pass such an ordering, isn't it a natural desire to pass an ordering, which can handle "incomparability" according to our demands
Ordering
in Scala represents only total orderings, i.e. ones where all elements are comparable. There is a PartialOrdering[T]
: http://www.scala-lang.org/api/2.10.3/index.html#scala.math.PartialOrdering, but there are a couple of problems:
It is not actually used anywhere in the standard library.
If you try to implement max
/maxBy
/etc. which take PartialOrdering
, you'll quickly see that it isn't generally possible except in cases like Float
/Double
where you have some elements which aren't comparable with anything and all the rest are comparable with each other (and you can decide to just ignore the incomparable elements).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With