I need to run both sort!
and uniq!
on an array. Which is better to run first? Or is there a way to combine these into one command?
The Ruby sort method works by comparing elements of a collection using their <=> operator (more about that in a second), using the quicksort algorithm. You can also pass it an optional block if you want to do some custom sorting. The block receives two parameters for you to specify how they should be compared.
You can use the sort method on an array, hash, or another Enumerable object & you'll get the default sorting behavior (sort based on <=> operator) You can use sort with a block, and two block arguments, to define how one object is different than another (block should return 1, 0, or -1)
I made a little benchmark test with different combinations of uniq uniq! sort and sort! There are no significant differences:
user system total real
sort!.uniq!103.547000 0.172000 103.719000 (104.093750)
uniq!.sort!100.437000 0.093000 100.530000 (100.859375)
uniq.sort 100.516000 0.157000 100.673000 (101.031250)
sort.uniq 103.563000 0.062000 103.625000 (103.843750)
What you may not use is something like:
array = [1]
array.uniq!.sort!
uniq! will result in nil and sort! will throw an exception.
The benchmark I used:
require 'benchmark'
require 'date'
TEST_LOOPS = 10_000
ARRAY = []
1000.times{
ARRAY << Date.new(1900 + rand(100), rand(11)+1, rand(27) + 1 )
}
Benchmark.bm(10) {|b|
b.report('sort!.uniq!') {
TEST_LOOPS.times {
a = ARRAY.dup
a.sort!
a.uniq!
} #Testloops
} #b.report
b.report('uniq!.sort!') {
TEST_LOOPS.times {
a = ARRAY.dup
# uniq!.sort! not possible. uniq! may get nil
a.uniq!
a.sort!
} #Testloops
} #b.report
b.report('uniq.sort') {
TEST_LOOPS.times {
a = ARRAY.dup.uniq.sort
} #Testloops
} #b.report
b.report('sort.uniq') {
TEST_LOOPS.times {
a = ARRAY.dup.sort.uniq
} #Testloops
} #b.report
} #Benchmark
In fact, it depends from the number of unique values. In the example of knut, the start set could include at most 365 unique values out of 1000, and the order of operations seemed without influence.
if 'uniq' significantly reduces the array size, there is a distinct advantage in running it first.
A=[]
10_000.times do
A << rand(80)
end
Benchmark.bm(10) do |b|
b.report "sort.uniq" do
10_000.times {A.sort.uniq}
end
b.report "uniq.sort" do
10_000.times {A.uniq.sort}
end
end
user system total real
sort.uniq 20.202000 0.281000 20.483000 ( 20.978098)
uniq.sort 9.298000 0.000000 9.298000 ( 9.355936)
I did not test the '.uniq!.sort!' permutations, but I believe that they should follow the above result.
This example may be a little extreme, but I do not see why one should not always run '.uniq' first
It really doesn't matter which way you do this. I guess the uniq first so it results in less items to sort with one pass through the array. So you can do
a=[3,3,3,3,6,7,1,1,1,1,3]
a.uniq!
a.sort!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With