Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash "has_key" complexity in Ruby

I have a hash vars = {"a" => "Name", "b" => "Address" , "c" => "Phone"}. I want to check the performance of this line :

vars.has_key(:b)?

Is it O(1) or O(size of hash) ?

like image 314
Hellboy Avatar asked May 12 '15 13:05

Hellboy


People also ask

How does Ruby calculate Hash value?

To find a hash key by it's value, i.e. reverse lookup, one can use Hash#key . It's available in Ruby 1.9+.

What is hashing in Ruby?

Advertisements. A Hash is a collection of key-value pairs like this: "employee" = > "salary". It is similar to an Array, except that indexing is done via arbitrary keys of any object type, not an integer index.

How do you check if a Hash contains a key Ruby?

We can check if a particular hash contains a particular key by using the method has_key?(key) . It returns true or false depending on whether the key exists in the hash or not.

What is Ruby Hash object?

A Hash is a dictionary-like collection of unique keys and their values. Also called associative arrays, they are similar to Arrays, but where an Array uses integers as its index, a Hash allows you to use any object type. Hashes enumerate their values in the order that the corresponding keys were inserted.


2 Answers

Simple benchmark:

require 'benchmark'

iterations = 10_000
small      = 10
big        = 1_000_000

small_hash = {}
big_hash   = {}

(1..small).each do |i|
  small_hash[i] = i
end

(1..big).each do |i|
  big_hash[i] = i
end

Benchmark.bmbm do |bm|
  bm.report('Small Hash') do
    iterations.times { small_hash.has_key?(1) }
  end

  bm.report('Big Hash') do
    iterations.times { big_hash.has_key?(1) }
  end
end

Running the test:

$ ruby has_key_test.rb 
                 user     system      total        real
Small Hash   0.000000   0.000000   0.000000 (  0.001167)
Big Hash     0.000000   0.000000   0.000000 (  0.001171)

So yes, I think we can consider the cost constant O(1) (at least, without check the internal MRI implementation).

like image 185
markets Avatar answered Oct 23 '22 01:10

markets


The source code of has_key is (http://ruby-doc.org/core-1.9.3/Hash.html#method-i-has_key-3F)

rb_hash_has_key(VALUE hash, VALUE key)
{
    if (!RHASH(hash)->ntbl)
        return Qfalse;
    if (st_lookup(RHASH(hash)->ntbl, key, 0)) {
        return Qtrue;
    }
    return Qfalse;
}

The st_lookup has following fragment (https://github.com/ruby/ruby/blob/ca6b174078fa15f33655be704d9409fdbc4f9929/st.c#L383):

if (table->entries_packed) {
    st_index_t i = find_packed_index(table, hash_val, key);
    if (i < table->real_entries) {
        if (value != 0) *value = PVAL(table, i);
        return 1;
    }
        return 0;
    }

Which tells us that if entries_packed then ruby uses index (O(1)) otherwise it uses unindexed search (O(n)).

Value of entries_packed seems to depend on the size of hash: (https://github.com/ruby/ruby/blob/ca6b174078fa15f33655be704d9409fdbc4f9929/st.c#L41)

#define MAX_PACKED_HASH (int)(ST_DEFAULT_PACKED_TABLE_SIZE * sizeof(st_table_entry*) / sizeof(st_packed_entry))

https://github.com/ruby/ruby/blob/ca6b174078fa15f33655be704d9409fdbc4f9929/st.c#L219

tbl->entries_packed = size <= MAX_PACKED_HASH;

The size is a kind of size of index.

You can find out more details in ruby sources but its complexity is not always O(1) but depends on the size of the hash. (on the size of its index)

like image 33
lx00st Avatar answered Oct 23 '22 00:10

lx00st