Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to hash a url in ruby?

Tags:

ruby

I'm writing a web app that points to external links. I'm looking to create a non-sequential, non-guessable id for each document that I can use in the URL. I did the obvious thing: treating the url as a string and str#crypt on it, but that seems to choke on any non-alphanumberic characters, like the slashes, dots and underscores.

Any suggestions on the best way to solve this problem?

Thanks!

like image 792
Jason Butler Avatar asked Sep 15 '08 23:09

Jason Butler


People also ask

Can you hash a URL?

In a URL, a hash mark, number sign, or pound sign ( # ) points a browser to a specific spot in a page or website. It is used to separate the URI of an object from a fragment identifier. When you use a URL with a # , it doesn't always go to the correct part of the page or website.

How do hashes work in Ruby?

The hash function is declared in Ruby's kernel class, so it's available to any class with that in its inheritance chain. The actual number will vary depending on a number of factors, but what is important is that it remains consistent across a session. It repeats this process for the rest of the key/value pairs.

Do Ruby hashes preserve order?

As of Ruby 1.9, hashes also maintain order, but usually ordered items are stored in an array.

Are hashes mutable Ruby?

I know that in Ruby: - Integers, floats, and symbols are immutable. - Arrays, strings, and hashes are mutable.


2 Answers

Depending on how long a string you would like you can use a few alternatives:

require 'digest'
Digest.hexencode('http://foo-bar.com/yay/?foo=bar&a=22')
# "687474703a2f2f666f6f2d6261722e636f6d2f7961792f3f666f6f3d62617226613d3232"

require 'digest/md5'
Digest::MD5.hexdigest('http://foo-bar.com/yay/?foo=bar&a=22')
# "43facc5eb5ce09fd41a6b55dba3fe2fe"

require 'digest/sha1'
Digest::SHA1.hexdigest('http://foo-bar.com/yay/?foo=bar&a=22')
# "2aba83b05dc9c2d9db7e5d34e69787d0a5e28fc5"

require 'digest/sha2'
Digest::SHA2.hexdigest('http://foo-bar.com/yay/?foo=bar&a=22')
# "e78f3d17c1c0f8d8c4f6bd91f175287516ecf78a4027d627ebcacfca822574b2"

Note that this won't be unguessable, you may have to combine it with some other (secret but static) data to salt the string:

salt = 'foobar'
Digest::SHA1.hexdigest(salt + 'http://foo-bar.com/yay/?foo=bar&a=22')
# "dbf43aff5e808ae471aa1893c6ec992088219bbb"

Now it becomes much harder to generate this hash for someone who doesn't know the original content and has no access to your source.

like image 139
manveru Avatar answered Oct 01 '22 11:10

manveru


I would also suggest looking at the different algorithms in the digest namespace. To make it harder to guess, rather than (or in addition to) salting with a secret passphrase, you can also use a precise dump of the time:

require 'digest/md5'
def hash_url(url)
  Digest::MD5.hexdigest("#{Time.now.to_f}--#{url}")
end

Since the result of any hashing algorithm is not guaranteed to be unique, don't forget to check for the uniqueness of your result against previously generated hashes before assuming that your hash is usable. The use of Time.now makes the retry trivial to implement, since you only have to call until a unique hash is generated.

like image 41
webmat Avatar answered Oct 01 '22 12:10

webmat