Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

min/max of collections containing NaN (handling incomparability in ordering)

Tags:

max

scala

min

I just ran into a nasty bug as a result of the following behavior:

scala> List(1.0, 2.0, 3.0, Double.NaN).min
res1: Double = NaN

scala> List(1.0, 2.0, 3.0, Double.NaN).max
res2: Double = NaN

I understand that for a pairwise comparison it may sometimes be preferable to have max(NaN, 0) = NaN and this is probably the reason why java.lang.Double.compare follows this convention (there seems to be an IEEE standard for that). For a collection however, I really think that this is a strange convention. After all the above collection does contain valid numbers; and these numbers have a clear maximum and minimum. In my opinion, the conception that the maximum number of the collection is not a number is a contradiction since, well, NaN is not a number, so it cannot be the maximum or minimum "number" of a collection -- unless there are no valid numbers at all; in this case it makes perfect sense that the maximum "is not a number". Semantically the min and max functions degenerate to a check whether the collection contains a NaN. Since there are more appropriate ways to check for the existence of NaNs (e.g. collection.find(_.isNaN)) it would be great to maintain a semantically meaningful min/max on collections.

So my question is: What is the best approach to obtain the behavior to ignore the existence of NaNs? I see two possibilities:

  1. Filtering NaNs before calling min/max. Since this requires explicit handling of the issue in all places and may incur performance penalties I would prefer something easier.

  2. It would be great to have a kind of NaN-ignoring ordering which can be used as an implicit ordering wherever necessary. I tried the following:

      object NanAwareOrdering extends Ordering[Double] {
        def compare(x: Double, y: Double) = {
          if (x.isNaN()) {
            +1 // without checking x, return y < x
          } else if (y.isNaN()) {
            -1 // without checking y, return x < y
          } else {
            java.lang.Double.compare(x, y)
          }
        }
      }
    

    However, this approach seems to depend on whether I'm interesting in finding the min or max, i.e.:

     scala> List(1.0, 2.0, 3.0, Double.NaN).min(NanAwareOrdering)
     res7: Double = 1.0
    
     scala> List(1.0, 2.0, 3.0, Double.NaN).max(NanAwareOrdering)
     res8: Double = NaN
    

    This means that I would have to have two NanAwareOrdering depending on whether I want the minimum or the maximum, which would forbid to have an implicit val. Therefore my question is: How can I define an ordering in a way to handle both cases at once?

Update:

For the sake of completeness: In the course of analyzing the issue I realized that the premise "degenerates to a check for NaN" is actually wrong. In fact, I think it is even more ugly:

scala> List(1.0, Double.NaN).min
res1: Double = NaN

scala> List(Double.NaN, 1.0).min
res2: Double = 1.0
like image 897
bluenote10 Avatar asked May 09 '14 12:05

bluenote10


4 Answers

Disclaimer: I'll add my own answer to the question just in case anyone else is still interested in more details on the matter.

Some theory ...

I looks like this issue is more complex than I expected. As Alexey Romanov has already pointed out, the notion of incomparability would require the max/min functions to take a partial order. Unfortunately, Alexey is also right in the point, that a general max/min function based on a partial order does not make sense: Think of the case where the partial ordering only defines relations within certain groups, but the groups themselves are completely independent from each other (for instance, the elements {a, b, c, d} with just the two relations a < b and c < d; we would have two max/min). In that regard one may even argue that formally max/min should always return two values, NaN and the respective valid min/max since NaN itself is also an extremal value in its own relation group.

So as a result of partial orders being too general/complex, the min/max functions take an Ordering. Unfortunately a total order does not allow the notion of incomparability. Reviewing the three defining properties of a total order makes it pretty obvious that "ignoring NaNs" is formally impossible:

  1. If a ≤ b and b ≤ a then a = b (antisymmetry)
  2. If a ≤ b and b ≤ c then a ≤ c (transitivity)
  3. a ≤ b or b ≤ a (totality)

... and practice ...

So when trying to come up with an implementation of an Ordering to fulfill our desired min/max behavior it is clear that we have to violate something (and bear the consequences). The implementation of min/max/minBy/maxBy in TraversableOnce follows the pattern (for min):

reduceLeft((x, y) => if (cmp.lteq(x, y)) x else y)

and gteq for the max variants. This gave me the idea of "left biasing" the comparison, i.e.:

x   <comparison_operator> NaN    is always true to keep x in the reduction
NaN <comparison_operator> x      is always false to inject x into the reduction

The resulting implementation of such a "left biased" ordering would look like this:

object BiasedOrdering extends Ordering[Double] {
  def compare(x: Double, y: Double) = java.lang.Double.compare(x, y) // this is inconsistent, but the same goes for Double.Ordering

  override def lteq(x: Double, y: Double): Boolean  = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true  else compare(x, y) <= 0
  override def gteq(x: Double, y: Double): Boolean  = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true  else compare(x, y) >= 0
  override def lt(x: Double, y: Double): Boolean    = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) false else compare(x, y) < 0
  override def gt(x: Double, y: Double): Boolean    = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) false else compare(x, y) > 0
  override def equiv(x: Double, y: Double): Boolean = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true  else compare(x, y) == 0

}

... analyzed:

Currently I'm trying to find out:

  • how this order compares to the default ordering,
  • where do we violate total order properties,
  • and what are the potential issues.

I'm comparing this to Scala's default order Ordering.Double and the following ordering which is directly derived from java.lang.Double.compare:

object OrderingDerivedFromCompare extends Ordering[Double] {
  def compare(x: Double, y: Double) = {
    java.lang.Double.compare(x, y)
  }
}

One interesting property of Scala's default order Ordering.Double is that it overwrites all comparison member functions by the language's native numerical comparison operators (<, <=, ==, >=, >) so the comparison results are identical as if we would compare directly with these operators. The following shows all possible relations between a NaN and a valid number for the three orderings:

Ordering.Double             0.0 >  NaN = false
Ordering.Double             0.0 >= NaN = false
Ordering.Double             0.0 == NaN = false
Ordering.Double             0.0 <= NaN = false
Ordering.Double             0.0 <  NaN = false
OrderingDerivedFromCompare  0.0 >  NaN = false
OrderingDerivedFromCompare  0.0 >= NaN = false
OrderingDerivedFromCompare  0.0 == NaN = false
OrderingDerivedFromCompare  0.0 <= NaN = true
OrderingDerivedFromCompare  0.0 <  NaN = true
BiasedOrdering              0.0 >  NaN = true
BiasedOrdering              0.0 >= NaN = true
BiasedOrdering              0.0 == NaN = true
BiasedOrdering              0.0 <= NaN = true
BiasedOrdering              0.0 <  NaN = true

Ordering.Double             NaN >  0.0 = false
Ordering.Double             NaN >= 0.0 = false
Ordering.Double             NaN == 0.0 = false
Ordering.Double             NaN <= 0.0 = false
Ordering.Double             NaN <  0.0 = false
OrderingDerivedFromCompare  NaN >  0.0 = true
OrderingDerivedFromCompare  NaN >= 0.0 = true
OrderingDerivedFromCompare  NaN == 0.0 = false
OrderingDerivedFromCompare  NaN <= 0.0 = false
OrderingDerivedFromCompare  NaN <  0.0 = false
BiasedOrdering              NaN >  0.0 = false
BiasedOrdering              NaN >= 0.0 = false
BiasedOrdering              NaN == 0.0 = false
BiasedOrdering              NaN <= 0.0 = false
BiasedOrdering              NaN <  0.0 = false

Ordering.Double             NaN >  NaN = false
Ordering.Double             NaN >= NaN = false
Ordering.Double             NaN == NaN = false
Ordering.Double             NaN <= NaN = false
Ordering.Double             NaN <  NaN = false
OrderingDerivedFromCompare  NaN >  NaN = false
OrderingDerivedFromCompare  NaN >= NaN = true
OrderingDerivedFromCompare  NaN == NaN = true
OrderingDerivedFromCompare  NaN <= NaN = true
OrderingDerivedFromCompare  NaN <  NaN = false
BiasedOrdering              NaN >  NaN = false
BiasedOrdering              NaN >= NaN = true
BiasedOrdering              NaN == NaN = true
BiasedOrdering              NaN <= NaN = true
BiasedOrdering              NaN <  NaN = false

We can see that:

  • only OrderingDerivedFromCompare fulfills the total order properties. Based on this result the reasoning behind java.lang.Double.compare becomes much more clear: Placing NaN at the upper end of the total order simply avoids any contradiction!
  • Scala's default order and the biased order violate many totality conditions. Scala's default order always returns false, while for the biased order it depends on the position. Since both lead to contradictions it is difficult to see which may lead to more severe issues.

Now to our actual problem at hand, the min/max functions. For OrderingDerivedFromCompare it is now clear what we have to obtain -- NaN is simply the largest value, so it's clear to obtain it as max, irrespective of how the elements in the list are arranged:

OrderingDerivedFromCompare  List(1.0, 2.0, 3.0, Double.NaN).min = 1.0
OrderingDerivedFromCompare  List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
OrderingDerivedFromCompare  List(1.0, 2.0, 3.0, Double.NaN).max = NaN
OrderingDerivedFromCompare  List(Double.NaN, 1.0, 2.0, 3.0).max = NaN

Now to Scala's default ordering. I was deeply shocked to see that the situation is actually even more intricate than mentioned in my question:

Ordering.Double             List(1.0, 2.0, 3.0, Double.NaN).min = NaN
Ordering.Double             List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
Ordering.Double             List(1.0, 2.0, 3.0, Double.NaN).max = NaN
Ordering.Double             List(Double.NaN, 1.0, 2.0, 3.0).max = 3.0

In fact the order of the elements becomes relevant (as a result of returning false for every comparison in the reduceLeft). "Left biasing" obviously solves this issue, leading to consistent results:

BiasedOrdering              List(1.0, 2.0, 3.0, Double.NaN).min = 1.0
BiasedOrdering              List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
BiasedOrdering              List(1.0, 2.0, 3.0, Double.NaN).max = 3.0
BiasedOrdering              List(Double.NaN, 1.0, 2.0, 3.0).max = 3.0

Unfortunately, I'm still not able to fully answer all questions here. Some remaining points are:

  • Why is the Scala's default ordering defined the way it is? Currently handling of NaNs seems to be pretty flawed. A very dangerous detail of the Ordering.Double is that the compare function actually delegates to java.lang.Double.compare, while the comparison member are implemented based on the language's native comparisons. This obviously leads to inconsistent results, for instance:

    Ordering.Double.compare(0.0, Double.NaN) == -1     // indicating 0.0 < NaN
    Ordering.Double.lt     (0.0, Double.NaN) == false  // contradiction
    
  • What are potential drawbacks of the BiasedOrdering, apart from directly evaluating any contradicting comparison? A quick check on sorted gave the following results, which did not reveal any trouble:

    Ordering.Double             List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
    OrderingDerivedFromCompare  List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
    BiasedOrdering              List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
    
    Ordering.Double             List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
    OrderingDerivedFromCompare  List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
    BiasedOrdering              List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
    

For the time being I'll have a go with this left biased ordering. But since the nature of the problem does not allow a flawless general solution: use with care!

Update

And in terms of solutions based on an implicit class as monkjack suggested, I like the following a lot (since it does not mess with (flawed?) total orders at all, but internally converts to a clean totally ordered domain):

implicit class MinMaxNanAware(t: TraversableOnce[Double]) {
  def nanAwareMin = t.minBy(x => if (x.isNaN) Double.PositiveInfinity else x)
  def nanAwareMax = t.maxBy(x => if (x.isNaN) Double.NegativeInfinity else x)
}

// and now we can simply use
val goodMin = list.nanAwareMin
like image 125
bluenote10 Avatar answered Nov 06 '22 02:11

bluenote10


What about bringing an implicit into scope that would allow you to have new min/max methods on the list.

Something like:

object NanAwareMinOrdering extends Ordering[Double] {
    def compare(x: Double, y: Double) = {
      if (x.isNaN()) {
        +1 // without checking x, return y < x
      } else if (y.isNaN()) {
        -1 // without checking y, return x < y
      } else {
        java.lang.Double.compare(x, y)
      }
    }
  }

object NanAwareMaxOrdering extends Ordering[Double] {
  ....
}

implicit class MinMaxList(list:List[Double]) {
  def min2 = list.min(NanAwareMinOrdering)
  def max2 = list.max(NanAwareMaxOrdering)
}

List(1.0, 2.0, 3.0, Double.NaN).min2

like image 31
sksamuel Avatar answered Nov 06 '22 02:11

sksamuel


For

val a = List(1.0, 2.0, 3.0, Double.NaN)

sort it,

a.sortWith {_ >_ }
res: List[Double] = List(3.0, 2.0, 1.0, NaN)

and so NaN values are relegated, thus for max,

a.sortWith {_ >_ }.head
res: Double = 3.0

Likewise

a.sortWith {_ < _ }
res: List[Double] = List(1.0, 2.0, 3.0, NaN)

and so for min,

a.sortWith {_ < _ }.head
res: Double = 1.0
like image 1
elm Avatar answered Nov 06 '22 02:11

elm


This answer is simply for explaining the issue, @monkjack's answer probably provides the best practical solution.

Now since Scala offers the possibility to implicitly pass such an ordering, isn't it a natural desire to pass an ordering, which can handle "incomparability" according to our demands

Ordering in Scala represents only total orderings, i.e. ones where all elements are comparable. There is a PartialOrdering[T]: http://www.scala-lang.org/api/2.10.3/index.html#scala.math.PartialOrdering, but there are a couple of problems:

  1. It is not actually used anywhere in the standard library.

  2. If you try to implement max/maxBy/etc. which take PartialOrdering, you'll quickly see that it isn't generally possible except in cases like Float/Double where you have some elements which aren't comparable with anything and all the rest are comparable with each other (and you can decide to just ignore the incomparable elements).

like image 1
Alexey Romanov Avatar answered Nov 06 '22 02:11

Alexey Romanov