Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash to create unique URL so that it is difficul to guess valid URLs in lieu of authentication

Tags:

security

url

hash

I'm considering writing a webpage which provides a feedback form for customers.
I would like customers to be able to access this form with a unique URL for any particular order; a simple example would be in the format of http://www.example.com/feedback/012345.

Sidenote: I am already familiar with URL rewriting. The creation / redirection of the URL in a given language or on a given server is outside of the scope of this question

It is unacceptable for the customer to have to authenticate (or, for that matter, even possess an account) in order to access the feedback form. But, out of concern for statistical correctness, I am not satisfied with the obvious solution above of example.com/feedback/[order_id]. This allows for anybody to change the ID in the URL and access feedback for another order.

On the other hand, I would ideally like these URLs to be able to be typed in from a printout such as a receipt, so providing a long hash like on a lot of "reset password" links is not a valid option.

Based on this, I have the following criteria :

  1. A unique URL should exist for every order
  2. This URL should be relatively human typeable, for now take this to mean "short"
  3. It should not be possible to change characters in the URL and access other orders (within reasonable realms of probability)

My line of thinking is that I should put two pieces of data in the URL. There's a lot of human readable data available on customer records which could help with this, for example customer ID, contact phone number for order, surname... While this removes the ability to change a digit or two, I don't see it markedly improving the "guessability" of a given URL to a bored attacker.

Taking the trivial example of /feedback/[surname]/[id], you can simple dictionary attack the page to receive a set of valid URLs:

for x in range(00000,99999):
    for name in ["jones","smith", ....]:
        url = "http://www.example.com/feedback/"+name+"/"+x
        if exists(url):
            print(url)



The next thing I considered, with services like tinyurl in mind, is hashing the ID number and providing a URL in the form of /feedback/[hash]/[id]
I've since done some research and learned that URL shortening services actually probably use auto-increment ID records rather than an actual mathematical hash. This method is probably not useful

If an actual hash function is used, it is important that the hash isn't obviously derived from the ID number. Providing a URL like /feedback/trpxq/53192 would be useless, because after seeing one or two of them you can trivially pull the previous record: /feedback/trpxp/53191

I was thinking then, that it would maybe be helpful if the hash included a salt so that, even with the knowledge of what hash function was used, it will not be possible to hash any old valid order ID and bring up a valid URL.

So, finally, here's the actual question:

What function would be best used to create a short, non-obvious, relatively unique alphanumeric representation of a hash based on a 7-10 digit integer ID and an arbitrary salt?

This isn't a URL shortening problem per se, so I would be satisfied if the hash part is the same length as the ID part: reduction to 10 alphanumeric characters is acceptable.

In addition, the hash does not necessarily need to be calculated every time the URL is accessed. It's possible to calculate it either when the order record is created, or when the page is first accessed for that order ID. This means the hash function doesn't strictly have to be fast.

Although there's the potential to essentially create a hash table for lookup purposes, this problem doesn't have the same restraints as a hash table: lookup is done based on an already unique value, so conflict resolution isn't strictly necessary provided it is sufficiently difficult to find another record with the same hash for a given hash.



We're now well and truly outside of the practical realms of what needs to be done to secure a feedback form - this data isn't actually that important to protect - but humour me, this is an interesting problem and I'd like to know if there's a good solution which maximizes both security and readability.

like image 866
Kiirani Avatar asked Jan 08 '13 13:01

Kiirani


1 Answers

Fundamentally, this comes down to a very simple situation.

  • You have a namespace of identifiers with n bits each. The size of this namespace is 2ⁿ.
  • Out of this namespace, you only actually issue v valid identifiers.
  • The probability of a user guessing one of the valid identifiers by chance is p = v/2ⁿ.

You have to make a compromise. A larger n will mean a smaller p but identifiers that are harder to type/copy/remember. A smaller n will make for shorter identifiers but increase p.

One thing that will help is if the identifiers are valid for a limited time only. This allows you to issue lots and lots of identifiers without increasing p too much because only a fraction of the issued identifiers will actually be valid at the same time. Your use case may or may not dictate a certain minimum reasonable lifetime for identifiers.

In any case, the identifiers should certainly be based on either random numbers or a cryptographic hash of several things including a secret key so that there is no discernable relation to meaningful pieces of information like order numbers or sequence/serial numbers. If you use a hash and you choose n smaller than the length of the output of the hash, it's perfectly all right to truncate the hash.

The most compact way of encoding the identifiers into URLs would be something like base64. The downside of base64 is of course that the encoded strings have no meaning for humans. You can use various encoding schemes based on things like generating pronouncable words or sequences of words from a dictionary. These might be more memorable to humans, but they will be really a lot longer that a compact representation for the same amount of entropy, so it's probably not worth it (especially if the URLs are usually clicked, copy&pasted, or scanned as QR codes).

like image 94
Celada Avatar answered Sep 30 '22 19:09

Celada