Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Custom sort using spaceship operator in ruby [duplicate]

I am implementing a custom sort. There is the spaceship operator <=> to deal with sorting an array:

myArray.sort { |a, b| a <=> b }
  • a <=> b returns 1 when b is larger than a, and the two elements are swapped.
  • a <=> b returns 0 when a equals to b, and the two elements stay in the original position.
  • a <=> b returns -1 when a is less than b, and the two element stay in the original position.

So I tested with an example:

myArray = [2, 1, 7, 9, 3, 8, 0]
myArray.sort { |a, b| 1 <=> 1 } # making it always return 0
#=> [9, 1, 7, 2, 3, 8, 0]

The result is not what I expect. In my expectation, when the spaceship operator returns 0, every element would stay in the original position. Since the spaceship operator always returns 0 in the example above, the array should remain as is. However, the result is different from the original array.

Is there something I misunderstand?

Updated:

Following was the idea where my question above came from. Originally I was trying to order objects by their attributes(Let's assume the attribute is status).

For example

myObjects = [obj1, obj2,..., objn] # the objects are ordered by create_time already

myObjects.sort{|a,b|
    case a.status
        when 'failed'
            x = 1
        when 'success'
            x = 0
        when 'pending'
            x = -1
     end

     case b.status
        when 'failed'
            y = 1
        when 'success'
            y = 0
        when 'pending'
            y = -1
     end

     x <=> y  # compare two objects (a and b) by the status 
}

myObjects is ordered by created_time already, but I want to sort it again by each object's status.

For instance,

Two objects with the same created time(only taking hours and minutes into consideration here, just ignoring seconds) will be sorted again by their status, making objects with failed status be put at the end of the array.

The x and y value in the code above will depend on the object's status, and x y are compared to decide the order. If the statuses of two objects are equal (x == y), the elements should stay in the same position because they are sorted by created_time already, no needs to sort them again.

When the status of two objects are both success, x <=> y will return 0. But according to some comments, the comparison value 0 returned by the spaceship operator seems to output unpredictable order. What if myObjects contains elements all with the same status? It would cause the spaceship operator returns 0 all the time since x == y.

In my expectation, myObjects should remain the same order since the statuses are all the same, how should I correct when using the spaceship operator in my case?

Many thanks to everyone's help!

like image 249
Wilson.Wang Avatar asked Nov 30 '22 21:11

Wilson.Wang


2 Answers

Your assumption about how the sorting works is incorrect. As per the documentation #sort is not stable:

The result is not guaranteed to be stable. When the comparison of two elements returns 0, the order of the elements is unpredictable.

like image 167
ndnenkov Avatar answered Dec 02 '22 11:12

ndnenkov


Array#sort uses Quicksort algorithm, which is not stable and can produce this behaviour when elements are "equal".

Reason is in choosing and moving pivot element at every step, ruby implementation seems to choose pivot at middle for this case (but it can be chosen differently).

This is what happens in your example:

  1. pivot is chosen at element 9 at middle of array
  2. now algorithm ensures that items on left of pivot are less than it and items on the right are greater or equal, because all are "equal" - this makes everything to be in right part
  3. now recursively repeat for left(this is always empty in this case) and right partitions
  4. result is sorted_left + [pivot] + sorted_right, left is empty thus pivot is moved

Ruby core documentation mentions this:

When the comparison of two elements returns 0, the order of the elements is unpredictable.

Also spaceship operator <=> does not play any role here, you could just call myArray.sort{0} for the same effect.

Update:

From updated question it's clear that you want to sort by two attributes, this can be done several ways:

Method1: you can invent a metric/key that takes both values into account and sort by it:

 status_order = { 'success' => 1, 'failed' => 2, 'pending' => 3 }
 myObjects.sort_by{|o| "#{status_order[o.status]}_#{o.created_time}" }

This is not very optimal in terms of extreme performance, but is shorter.

Method2: implicit composite key by writing comparison rule like this:

status_order = { 'success' => 1, 'failed' => 2, 'pending' => 3 }
status_order.default = 0

myObjects.sort{|a,b|
  if a.status == b.status
    a.created_time <=> b.created_time
  else
    status_order[a.status] <=> status_order[b.status]
  end
}
like image 43
Vasfed Avatar answered Dec 02 '22 11:12

Vasfed