Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ruby: sort! and uniq! Which to run first?

Tags:

ruby

I need to run both sort! and uniq! on an array. Which is better to run first? Or is there a way to combine these into one command?

like image 828
David Oneill Avatar asked Jul 15 '11 19:07

David Oneill


People also ask

How does sort work in Ruby?

The Ruby sort method works by comparing elements of a collection using their <=> operator (more about that in a second), using the quicksort algorithm. You can also pass it an optional block if you want to do some custom sorting. The block receives two parameters for you to specify how they should be compared.

How do you sort objects in Ruby?

You can use the sort method on an array, hash, or another Enumerable object & you'll get the default sorting behavior (sort based on <=> operator) You can use sort with a block, and two block arguments, to define how one object is different than another (block should return 1, 0, or -1)


3 Answers

I made a little benchmark test with different combinations of uniq uniq! sort and sort! There are no significant differences:

                user     system      total        real
sort!.uniq!103.547000   0.172000 103.719000 (104.093750)
uniq!.sort!100.437000   0.093000 100.530000 (100.859375)
uniq.sort 100.516000   0.157000 100.673000 (101.031250)
sort.uniq 103.563000   0.062000 103.625000 (103.843750)

What you may not use is something like:

array = [1]
array.uniq!.sort!

uniq! will result in nil and sort! will throw an exception.

The benchmark I used:

require 'benchmark'
require 'date'

TEST_LOOPS = 10_000
ARRAY = []
1000.times{ 
  ARRAY << Date.new(1900 + rand(100), rand(11)+1, rand(27) + 1 ) 
}
Benchmark.bm(10) {|b|

  b.report('sort!.uniq!') {
   TEST_LOOPS.times { 
      a = ARRAY.dup
      a.sort!
      a.uniq!
   }            #Testloops
  }             #b.report

  b.report('uniq!.sort!') {
   TEST_LOOPS.times { 
      a = ARRAY.dup
      # uniq!.sort! not possible. uniq! may get nil
      a.uniq!
      a.sort!
   }            #Testloops
  }             #b.report

  b.report('uniq.sort') {
   TEST_LOOPS.times { 
      a = ARRAY.dup.uniq.sort
   }            #Testloops
  }             #b.report

  b.report('sort.uniq') {
   TEST_LOOPS.times { 
      a = ARRAY.dup.sort.uniq
   }            #Testloops
  }             #b.report

} #Benchmark
like image 54
knut Avatar answered Oct 22 '22 14:10

knut


In fact, it depends from the number of unique values. In the example of knut, the start set could include at most 365 unique values out of 1000, and the order of operations seemed without influence.

if 'uniq' significantly reduces the array size, there is a distinct advantage in running it first.

A=[]
10_000.times do
  A << rand(80)
end

Benchmark.bm(10) do |b|
  b.report "sort.uniq" do
    10_000.times {A.sort.uniq}
  end
  b.report "uniq.sort" do
    10_000.times {A.uniq.sort}
  end
end

                 user     system      total        real
sort.uniq   20.202000   0.281000  20.483000 ( 20.978098)
uniq.sort    9.298000   0.000000   9.298000 (  9.355936)

I did not test the '.uniq!.sort!' permutations, but I believe that they should follow the above result.

This example may be a little extreme, but I do not see why one should not always run '.uniq' first

like image 10
user2698903 Avatar answered Oct 22 '22 15:10

user2698903


It really doesn't matter which way you do this. I guess the uniq first so it results in less items to sort with one pass through the array. So you can do

 a=[3,3,3,3,6,7,1,1,1,1,3]
 a.uniq!
 a.sort!
like image 5
Michael Papile Avatar answered Oct 22 '22 14:10

Michael Papile