Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are Ruby #hash methods randomized?

Tags:

ruby

I just noticed that the return value of #hash changes each time I start up Ruby:

$ irb
2.0.0-p353 :001 > "".hash
2313425349783613115
2.0.0-p353 :002 > exit

$ irb
2.0.0-p353 :001 > "".hash
4543564897974813688
2.0.0-p353 :002 > exit

I looked at the MRI source to see why this was happening:

st_index_t
rb_str_hash(VALUE str)
{
    int e = ENCODING_GET(str);
    if (e && rb_enc_str_coderange(str) == ENC_CODERANGE_7BIT) {
    e = 0;
    }
    return rb_memhash((const void *)RSTRING_PTR(str), RSTRING_LEN(str)) ^ e;
}

It turns out rb_memhash is defined in random.c:

st_index_t
rb_memhash(const void *ptr, long len)
{
    sip_uint64_t h = sip_hash24(sipseed.key, ptr, len);
#ifdef HAVE_UINT64_T
    return (st_index_t)h;
#else
    return (st_index_t)(h.u32[0] ^ h.u32[1]);
#endif
}

And though I can't find what ruby_sip_hash24 is, I assume that it's not a deterministic function.

After a bit of messing around, I managed to find this commit by Tanaka Akira that changes rb_str_hash to use rb_memhash due to "avoid algorithmic complexity attacks". What does that mean?

Thanks!

like image 791
ucarion Avatar asked Mar 19 '23 18:03

ucarion


1 Answers

As the commit message said, it is due to avoid algorithmic complexity attacks.

An algorithmic complexity attack is a form of computer attack that exploits known cases in which an algorithm used in a piece of software will exhibit worst case behavior. This type of attack can be used to achieve a denial-of-service.

By using rb_memhash, the hash result will be randomized every time you start a new ruby execution context. Otherwise, if is not randomized, the attacker know the algorithm and could find out the worst case behavior which could used as the DoS Attack.

like image 90
xdazz Avatar answered Mar 22 '23 10:03

xdazz