Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the most efficient way to deep copy an object in Ruby?

I know that serializing an object is (to my knowledge) the only way to effectively deep-copy an object (as long as it isn't stateful like IO and whatnot), but is one way particularly more efficient than another?

For example, since I'm using Rails, I could always use ActiveSupport::JSON, to_xml - and from what I can tell marshalling the object is one of the most accepted ways to do this. I'd expect that marshalling is probably the most efficient of these since it's a Ruby internal, but am I missing anything?

Edit: note that its implementation is something I already have covered - I don't want to replace existing shallow copy methods (like dup and clone), so I'll just end up likely adding Object::deep_copy, the result of which being whichever of the above methods (or any suggestions you have :) that has the least overhead.

like image 936
mway Avatar asked Apr 13 '11 01:04

mway


People also ask

Which is better deep copy or shallow copy?

Shallow Copy stores the copy of the original object and points the references to the objects. Deep copy stores the copy of the original object and recursively copies the objects as well. Shallow copy is faster. Deep copy is comparatively slower.

Which is faster shallow copy or deep copy?

Shallow copy is faster than Deep copy. Deep copy is slower than Shallow copy. 3. The changes made in the copied object also reflect the original object.

How do you copy an object in Ruby?

Ruby does provide two methods for making copies of objects, including one that can be made to do deep copies. The Object#dup method will make a shallow copy of an object. To achieve this, the dup method will call the initialize_copy method of that class.


2 Answers

I was wondering the same thing, so I benchmarked a few different techniques against each other. I was primarily concerned with Arrays and Hashes - I didn't test any complex objects. Perhaps unsurprisingly, a custom deep-clone implementation proved to be the fastest. If you are looking for quick and easy implementation, Marshal appears to be the way to go.

I also benchmarked an XML solution with Rails 3.0.7, not shown below. It was much, much slower, ~10 seconds for only 1000 iterations (the solutions below all ran 10,000 times for the benchmark).

Two notes regarding my JSON solution. First, I used the C variant, version 1.4.3. Second, it doesn't actually work 100%, as symbols will be converted to Strings.

This was all run with ruby 1.9.2p180.

#!/usr/bin/env ruby
require 'benchmark'
require 'yaml'
require 'json/ext'
require 'msgpack'

def dc1(value)
  Marshal.load(Marshal.dump(value))
end

def dc2(value)
  YAML.load(YAML.dump(value))
end

def dc3(value)
  JSON.load(JSON.dump(value))
end

def dc4(value)
  if value.is_a?(Hash)
    result = value.clone
    value.each{|k, v| result[k] = dc4(v)}
    result
  elsif value.is_a?(Array)
    result = value.clone
    result.clear
    value.each{|v| result << dc4(v)}
    result
  else
    value
  end
end

def dc5(value)
  MessagePack.unpack(value.to_msgpack)
end

value = {'a' => {:x => [1, [nil, 'b'], {'a' => 1}]}, 'b' => ['z']}

Benchmark.bm do |x|
  iterations = 10000
  x.report {iterations.times {dc1(value)}}
  x.report {iterations.times {dc2(value)}}
  x.report {iterations.times {dc3(value)}}
  x.report {iterations.times {dc4(value)}}
  x.report {iterations.times {dc5(value)}}
end

results in:

user       system     total       real
0.230000   0.000000   0.230000 (  0.239257)  (Marshal)
3.240000   0.030000   3.270000 (  3.262255)  (YAML) 
0.590000   0.010000   0.600000 (  0.601693)  (JSON)
0.060000   0.000000   0.060000 (  0.067661)  (Custom)
0.090000   0.010000   0.100000 (  0.097705)  (MessagePack)
like image 67
Evan Pon Avatar answered Oct 18 '22 03:10

Evan Pon


I think you need to add an initialize_copy method to the class you are copying. Then put the logic for the deep copy in there. Then when you call clone it will fire that method. I haven't done it but that's my understanding.

I think plan B would be just overriding the clone method:

class CopyMe
    attr_accessor :var
    def initialize var=''
      @var = var
    end    
    def clone deep= false
      deep ? CopyMe.new(@var.clone) : CopyMe.new()
    end
end

a = CopyMe.new("test")  
puts "A: #{a.var}"
b = a.clone
puts "B: #{b.var}"
c = a.clone(true)
puts "C: #{c.var}"

Output

mike@sleepycat:~/projects$ ruby ~/Desktop/clone.rb 
A: test
B: 
C: test

I'm sure you could make that cooler with a little tinkering but for better or for worse that is probably how I would do it.

like image 22
mikewilliamson Avatar answered Oct 18 '22 02:10

mikewilliamson