I have a hash vars = {"a" => "Name", "b" => "Address" , "c" => "Phone"}
. I want to check the performance of this line :
vars.has_key(:b)?
Is it O(1) or O(size of hash) ?
To find a hash key by it's value, i.e. reverse lookup, one can use Hash#key . It's available in Ruby 1.9+.
Advertisements. A Hash is a collection of key-value pairs like this: "employee" = > "salary". It is similar to an Array, except that indexing is done via arbitrary keys of any object type, not an integer index.
We can check if a particular hash contains a particular key by using the method has_key?(key) . It returns true or false depending on whether the key exists in the hash or not.
A Hash is a dictionary-like collection of unique keys and their values. Also called associative arrays, they are similar to Arrays, but where an Array uses integers as its index, a Hash allows you to use any object type. Hashes enumerate their values in the order that the corresponding keys were inserted.
Simple benchmark:
require 'benchmark'
iterations = 10_000
small = 10
big = 1_000_000
small_hash = {}
big_hash = {}
(1..small).each do |i|
small_hash[i] = i
end
(1..big).each do |i|
big_hash[i] = i
end
Benchmark.bmbm do |bm|
bm.report('Small Hash') do
iterations.times { small_hash.has_key?(1) }
end
bm.report('Big Hash') do
iterations.times { big_hash.has_key?(1) }
end
end
Running the test:
$ ruby has_key_test.rb
user system total real
Small Hash 0.000000 0.000000 0.000000 ( 0.001167)
Big Hash 0.000000 0.000000 0.000000 ( 0.001171)
So yes, I think we can consider the cost constant O(1) (at least, without check the internal MRI implementation).
The source code of has_key
is (http://ruby-doc.org/core-1.9.3/Hash.html#method-i-has_key-3F)
rb_hash_has_key(VALUE hash, VALUE key)
{
if (!RHASH(hash)->ntbl)
return Qfalse;
if (st_lookup(RHASH(hash)->ntbl, key, 0)) {
return Qtrue;
}
return Qfalse;
}
The st_lookup
has following fragment (https://github.com/ruby/ruby/blob/ca6b174078fa15f33655be704d9409fdbc4f9929/st.c#L383):
if (table->entries_packed) {
st_index_t i = find_packed_index(table, hash_val, key);
if (i < table->real_entries) {
if (value != 0) *value = PVAL(table, i);
return 1;
}
return 0;
}
Which tells us that if entries_packed
then ruby uses index (O(1)) otherwise it uses unindexed search (O(n)).
Value of entries_packed
seems to depend on the size of hash: (https://github.com/ruby/ruby/blob/ca6b174078fa15f33655be704d9409fdbc4f9929/st.c#L41)
#define MAX_PACKED_HASH (int)(ST_DEFAULT_PACKED_TABLE_SIZE * sizeof(st_table_entry*) / sizeof(st_packed_entry))
https://github.com/ruby/ruby/blob/ca6b174078fa15f33655be704d9409fdbc4f9929/st.c#L219
tbl->entries_packed = size <= MAX_PACKED_HASH;
The size
is a kind of size of index.
You can find out more details in ruby sources but its complexity is not always O(1) but depends on the size of the hash. (on the size of its index)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With