I'm writing a web app that points to external links. I'm looking to create a non-sequential, non-guessable id for each document that I can use in the URL. I did the obvious thing: treating the url as a string and str#crypt on it, but that seems to choke on any non-alphanumberic characters, like the slashes, dots and underscores.
Any suggestions on the best way to solve this problem?
Thanks!
In a URL, a hash mark, number sign, or pound sign ( # ) points a browser to a specific spot in a page or website. It is used to separate the URI of an object from a fragment identifier. When you use a URL with a # , it doesn't always go to the correct part of the page or website.
The hash function is declared in Ruby's kernel class, so it's available to any class with that in its inheritance chain. The actual number will vary depending on a number of factors, but what is important is that it remains consistent across a session. It repeats this process for the rest of the key/value pairs.
As of Ruby 1.9, hashes also maintain order, but usually ordered items are stored in an array.
I know that in Ruby: - Integers, floats, and symbols are immutable. - Arrays, strings, and hashes are mutable.
Depending on how long a string you would like you can use a few alternatives:
require 'digest'
Digest.hexencode('http://foo-bar.com/yay/?foo=bar&a=22')
# "687474703a2f2f666f6f2d6261722e636f6d2f7961792f3f666f6f3d62617226613d3232"
require 'digest/md5'
Digest::MD5.hexdigest('http://foo-bar.com/yay/?foo=bar&a=22')
# "43facc5eb5ce09fd41a6b55dba3fe2fe"
require 'digest/sha1'
Digest::SHA1.hexdigest('http://foo-bar.com/yay/?foo=bar&a=22')
# "2aba83b05dc9c2d9db7e5d34e69787d0a5e28fc5"
require 'digest/sha2'
Digest::SHA2.hexdigest('http://foo-bar.com/yay/?foo=bar&a=22')
# "e78f3d17c1c0f8d8c4f6bd91f175287516ecf78a4027d627ebcacfca822574b2"
Note that this won't be unguessable, you may have to combine it with some other (secret but static) data to salt the string:
salt = 'foobar'
Digest::SHA1.hexdigest(salt + 'http://foo-bar.com/yay/?foo=bar&a=22')
# "dbf43aff5e808ae471aa1893c6ec992088219bbb"
Now it becomes much harder to generate this hash for someone who doesn't know the original content and has no access to your source.
I would also suggest looking at the different algorithms in the digest namespace. To make it harder to guess, rather than (or in addition to) salting with a secret passphrase, you can also use a precise dump of the time:
require 'digest/md5'
def hash_url(url)
Digest::MD5.hexdigest("#{Time.now.to_f}--#{url}")
end
Since the result of any hashing algorithm is not guaranteed to be unique, don't forget to check for the uniqueness of your result against previously generated hashes before assuming that your hash is usable. The use of Time.now makes the retry trivial to implement, since you only have to call until a unique hash is generated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With