Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Consistent String#hash based only on the string's content

GOAL: Map every URL handled by a server to 0, 1, 2, or 3, distributing as uniformly as possible.

While the documentation for ruby's String#hash method says it will "return a hash based on the string‘s length and content," this clearly isn't the whole story. A given string's hash is not consistent across invocations of the interpreter:

$ irb
ruby-1.9.2-p180 :001 > "foo".hash
 => 360517580588231756 
ruby-1.9.2-p180 :002 > ^D

$ irb
ruby-1.9.2-p180 :001 > "foo".hash
 => -2716152678666510148 

This means a particular string's hash value may differ across, say, servers. Rails uses String#hash internally to map a URL path to one of four asset hosts (if the app's asset_host is so configured), but this feature is a lot less efficient than it could be because of the cross-machine inconsistencies; different servers may map the same URL to different asset hosts, reducing the effectiveness of caches, clouding skies, cooling cups of tea prematurely, besmirching the reputations of otherwise fine programmers.

Can you suggest an alternate hash function that could effectively and speedily distribute hashes across a typical app's URL space, preferably one that produces a Fixnum since, in the end, I'll want to map it into one of four asset hosts?

like image 998
Rob Davis Avatar asked Jun 30 '11 15:06

Rob Davis


4 Answers

there are lot of such functionality in ruby's digest module: http://ruby-doc.org/stdlib/libdoc/digest/rdoc/index.html

simple example:

require 'digest/sha1'
Digest::SHA1.hexdigest("some string")
like image 183
keymone Avatar answered Nov 03 '22 06:11

keymone


The easiest (and consistent) way may be this (and it's fast):

"https://www.example.com/abc/def/123?hij=345".sum % 4

That will always produce an integer 0 - 3, is quite fast, and should be fairly well distributed (though I haven't actually run tests on distribution).

like image 38
Jason Avatar answered Nov 03 '22 06:11

Jason


There is tiny library xxHash:

XXhash.xxh32('qwe') #=> 2396643526
XXhash.xxh64('qwe') #=> 9343136760830690622

Maybe it will have more collisions but it is 10x faster than SHA1:

Benchmark.bm do |x|
  n = 100_000
  str = 'qweqweqwe'
  x.report('xxhash32') { n.times { XXhash.xxh32(str) } }
  x.report('xxhash64') { n.times { XXhash.xxh64(str) } }
  x.report('hexadigest') { n.times { Digest::SHA1.hexdigest(str) } }
end;1

#       user     system      total        real
# xxhash32  0.020000   0.000000   0.020000 (  0.021948)
# xxhash64  0.040000   0.000000   0.040000 (  0.036340)
# hexadigest  0.240000   0.030000   0.270000 (  0.276443)
like image 2
Lev Lukomsky Avatar answered Nov 03 '22 08:11

Lev Lukomsky


You can try to_i(36).

"Hash me please :(".to_i(36)
=> 807137
like image 1
retro Avatar answered Nov 03 '22 07:11

retro