Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the maximum value the String#hash method can return in Ruby?

Tags:

ruby

Title says it all, what the maximum value that can be returned from 'some random string'.hash in Ruby?

The docs don't offer much insight.

like image 896
Travis Avatar asked Dec 01 '25 16:12

Travis


2 Answers

The maximum size String#hash can output appears to be the maximum size of an unsigned long in your environment.

The String#hash function is implemented in rb_str_hash():

/* string.c, l. 2290 */

st_index_t
rb_str_hash(VALUE str)
{
    int e = ENCODING_GET(str);
    if (e && rb_enc_str_coderange(str) == ENC_CODERANGE_7BIT) {
        e = 0;
    }
    return rb_memhash((const void *)RSTRING_PTR(str), RSTRING_LEN(str)) ^ e;
}

st_index_t is defined as type st_data_t:

/* st.h, l. 48 */

typedef st_data_t st_index_t;

st_data_t is an unsigned long:

/* st.h, l. 20 */

typedef unsigned long st_data_t;

Since the hash is randomly generated (using SipHash), the entire range of values possible in an unsigned long should be available. In a 64-bit environment, unsigned long will be 64-bit, of course. SipHash's output is 64-bit, so in a 32-bit environment Ruby stores its output in an array with two 32-bit unsigned integers, and rb_memhash() combines them with a bitwise XOR.

in siphash.h:

/* siphash.h, l. 14 */

#ifndef HAVE_UINT64_T
typedef struct {
    uint32_t u32[2];
} sip_uint64_t;
#define uint64_t sip_uint64_t
#else
typedef uint64_t sip_uint64_t;
#endif

rb_memhash():

/* random.c, l. 1306 */

st_index_t
rb_memhash(const void *ptr, long len)
{
    sip_uint64_t h = sip_hash24(sipseed.key, ptr, len);
    #ifdef HAVE_UINT64_T
        return (st_index_t)h;
    #else
        return (st_index_t)(h.u32[0] ^ h.u32[1]);
    #endif
}

Here's Ruby's sip_hash24(), if you want to look at the implementation.

like image 62
Zoë Sparks Avatar answered Dec 03 '25 17:12

Zoë Sparks


The Object#hash method returns a Fixnum, which:

Holds Integer values that can be represented in a native machine word (minus 1 bit).

Annoyingly, there doesn't appear to be an easy way to determine the exact max value on a particular system (there is an open feature request by Matz - #7517), so you must currently compute it yourself.

The sample code below (https://stackoverflow.com/a/736313/244128) works on some Ruby platforms but not reliably on all of them:

FIXNUM_MAX = (2**(0.size * 8 -2) -1)
FIXNUM_MIN = -(2**(0.size * 8 -2))
like image 34
maerics Avatar answered Dec 03 '25 15:12

maerics



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!