I have a program that will store many instances of one class, let's say up to 10.000 or more. The class instances have several properties that I need from time to time, but their most important one is the ID.
class Document attr_accessor :id def ==(document) document.id == self.id end end
Now, what is the fastest way of storing thousands of these objects?
I used to put them all into an array of Documents:
documents = Array.new documents << Document.new # etc
Now an alternative would be to store them in a Hash:
documents = Hash.new doc = Document.new documents[doc.id] = doc # etc
In my application, I mostly need to find out whether a document exists at all. Is the Hash's has_key?
function significantly faster than a linear search of the Array and the comparison of Document
objects? Are both within O(n) or is has_key?
even O(1). Will I see the difference?
Also, sometimes I need to add Documents when it is already existing. When I use an Array, I would have to check with include?
before, when I use a Hash, I'd just use has_key?
again. Same question as above.
What are your thoughts? What is the fastest method of storing large amounts of data when 90% of the time I only need to know whether the ID exists (not the object itself!)
Ruby's arrays and hashes are indexed collections. Both store collections of objects, accessible using a key. With arrays, the key is an integer, whereas hashes support any object as a key. Both arrays and hashes grow as needed to hold new elements.
In Ruby, Hash is a collection of unique keys and their values. Hash is like an Array, except the indexing is done with the help of arbitrary keys of any object type. In Hash, the order of returning keys and their value by various iterators is arbitrary and will generally not be in the insertion order.
A Hash is a collection of key-value pairs like this: "employee" = > "salary". It is similar to an Array, except that indexing is done via arbitrary keys of any object type, not an integer index.
Creating an array of hashes You are allowed to create an array of hashes either by simply initializing array with hashes or by using array. push() to push hashes inside the array. Note: Both “Key” and :Key acts as a key in a hash in ruby.
Hashes are much faster for lookups:
require 'benchmark' Document = Struct.new(:id,:a,:b,:c) documents_a = [] documents_h = {} 1.upto(10_000) do |n| d = Document.new(n) documents_a << d documents_h[d.id] = d end searchlist = Array.new(1000){ rand(10_000)+1 } Benchmark.bm(10) do |x| x.report('array'){searchlist.each{|el| documents_a.any?{|d| d.id == el}} } x.report('hash'){searchlist.each{|el| documents_h.has_key?(el)} } end # user system total real #array 2.240000 0.020000 2.260000 ( 2.370452) #hash 0.000000 0.000000 0.000000 ( 0.000695)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With