Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Array Merge (Union)

I have two array I need to merge, and using the Union (|) operator is PAINFULLY slow.. are there any other ways to accomplish an array merge?

Also, the arrays are filled with objects, not strings.

An Example of the objects within the array

#<Article   id: 1,   xml_document_id: 1,   source: "<article><domain>events.waikato.ac</domain><excerpt...",   created_at: "2010-02-11 01:32:46",   updated_at: "2010-02-11 01:41:28" > 

Where source is a short piece of XML.

EDIT

Sorry! By 'merge' I mean I need to not insert duplicates.

A => [1, 2, 3, 4, 5] B => [3, 4, 5, 6, 7] A.magic_merge(B) #=> [1, 2, 3, 4, 5, 6, 7] 

Understanding that the integers are actually Article objects, and the Union operator appears to take forever

like image 293
Rabbott Avatar asked Feb 11 '10 03:02

Rabbott


People also ask

How do you merge elements in an array?

To merge elements from one array to another, we must first iterate(loop) through all the array elements. In the loop, we will retrieve each element from an array and insert(using the array push() method) to another array. Now, we can call the merge() function and pass two arrays as the arguments for merging.

What is Union of array?

Overview. A union is a set that contains values or elements present in the sets we are comparing. We can use the union() method to get the union between a set and an array.


2 Answers

Here's a script which benchmarks two merge techniques: using the pipe operator (a1 | a2), and using concatenate-and-uniq ((a1 + a2).uniq). Two additional benchmarks give the time of concatenate and uniq individually.

require 'benchmark'  a1 = []; a2 = [] [a1, a2].each do |a|   1000000.times { a << rand(999999) } end  puts "Merge with pipe:" puts Benchmark.measure { a1 | a2 }  puts "Merge with concat and uniq:" puts Benchmark.measure { (a1 + a2).uniq }  puts "Concat only:" puts Benchmark.measure { a1 + a2 }  puts "Uniq only:" b = a1 + a2 puts Benchmark.measure { b.uniq } 

On my machine (Ubuntu Karmic, Ruby 1.8.7), I get output like this:

Merge with pipe:   1.000000   0.030000   1.030000 (  1.020562) Merge with concat and uniq:   1.070000   0.000000   1.070000 (  1.071448) Concat only:   0.010000   0.000000   0.010000 (  0.005888) Uniq only:   0.980000   0.000000   0.980000 (  0.981700) 

Which shows that these two techniques are very similar in speed, and that uniq is the larger component of the operation. This makes sense intuitively, being O(n) (at best), whereas simple concatenation is O(1).

So, if you really want to speed this up, you need to look at how the <=> operator is implemented for the objects in your arrays. I believe that most of the time is being spent comparing objects to ensure inequality between any pair in the final array.

like image 119
Alex Reisner Avatar answered Oct 02 '22 19:10

Alex Reisner


Do you need the items to be in a specific order within the arrays? If not, you may want to check whether using Sets makes it faster.

Update

Adding to another answerer's code:

require "set" require "benchmark"  a1 = []; a2 = [] [a1, a2].each do |a|   1000000.times { a << rand(999999) } end  s1, s2 = Set.new, Set.new  [s1, s2].each do |s|   1000000.times { s << rand(999999) } end  puts "Merge with pipe:" puts Benchmark.measure { a1 | a2 }  puts "Merge with concat and uniq:" puts Benchmark.measure { (a1 + a2).uniq }  puts "Concat only:" puts Benchmark.measure { a1 + a2 }  puts "Uniq only:" b = a1 + a2 puts Benchmark.measure { b.uniq }  puts "Using sets" puts Benchmark.measure {s1 + s2}  puts "Starting with arrays, but using sets" puts Benchmark.measure {s3, s4 = [a1, a2].map{|a| Set.new(a)} ; (s3 + s4)} 

gives (for ruby 1.8.7 (2008-08-11 patchlevel 72) [universal-darwin10.0])

Merge with pipe:   1.320000   0.040000   1.360000 (  1.349563) Merge with concat and uniq:   1.480000   0.030000   1.510000 (  1.512295) Concat only:   0.010000   0.000000   0.010000 (  0.019812) Uniq only:   1.460000   0.020000   1.480000 (  1.486857) Using sets   0.310000   0.010000   0.320000 (  0.321982) Starting with arrays, but using sets   2.340000   0.050000   2.390000 (  2.384066) 

Suggests that sets may or may not be faster, depending on your circumstances (lots of merges or not many merges).

like image 23
Andrew Grimm Avatar answered Oct 02 '22 19:10

Andrew Grimm