I am building a RESTful API. The only problem I have is how to do the authentication, since I want a stateless approach where the only information the server has is in the request itself.
So I thought I would look how the big boys do it.
I see most services issue users/applications a token. That is then used on each subsequent request. For example Twitter and GitHub use OAuth2 and I see that they issue a bearer token. So far, so good - stateless, clean and simple:
$ curl -H "Authorization: token OAUTH-TOKEN" https://api.github.com/xyz
However I have a question: Do I store that OAUTH-TOKEN token in my database to verify the user ... and if so, how?
(Edited to clarify question)
Let's say this is my database table:
user | token
abc | 123
xyz | 789
The first user wants to make an API request using their token. So they know their token is "123" and so they do:
curl -H "Authorization: Bearer 123" https://myapi.com
That's all the information my API has to go on, so it looks up WHERE token = "123", and finds out it's user "abc". Simple. All good. Response returned.
Ideally I want my table to be like that (simple, no overhead) so my question really was: is it a bad idea to store the tokens in the database like that?
(I guess it is because I've got in the habit of thinking this is bad just because of dealing with normal email/password rows)
So then I thought, ok, let's say I do need to hash those tokens in my table: how would I then look up the row? That was where your final question about the lookup on the hashed value comes in: I assumed there is a chance of a collision, since if two tokens had the same hash, then if you look up based on the hashed value alone you wouldn't know which user had made the request, surely?
Which brought me on to how to add the additional value of how to identify the row. Just like how you need both an email and a password to identify a row - not just a password - I wondered what the equivalent would be here for an API request. But yes, the simplest solutions are the best and I think that simply passing it along with the token does solve the problem neatly.
So really you've answered the "how would I identify the row if I do need to store the tokens hashed" question.
The only question that remains is "Do I even need to store them hashed - and incur that overhead?"
I don't see the problem here, so I think I am misunderstanding something in your question. Here is what I think you are asking, please correct me where I am wrong:
Assuming this is true, then you can simply send more information in the auth header than just the token. An example could be:
Authorization: MyScheme base64urlEncodedUserName.base64urlEncodedAccessToken
That will allow you to perform lookups based on the user name.
I also don't see why using the token as key is a problem, even if you store it hashed. Just hash the incoming token and perform a lookup based on the hashed value?
Edit: thanks for clarifying the question, improved response below:
Yes and no. By storing tokens instead of user passwords, you have removed the danger of exposing a users password that he/she may have reused for multiple sites to an attacker. So it is definitely not as bad as storing passwords unhashed.
But it could still be pretty bad, depending on what kind of information or actions the token grants access to - if it is for something like a forum software, then it is probably okay. If there is credit card information involved, then it is definitely bad.
The question essentially becomes: what can an attacker do with the access token, that (s)he cannot already do, having hacked the database? If the only information available by using the token is already stored in the database and no dangerous actions can be performed using the token, then hashing the tokens gains you very little extra security.
Well this actually raises an interesting point. Lots of people use normal hash functions to hash their passwords along with a salt. This can cause collisions, yes. But if you hash your tokens, you should do so with a cryptographic hash function. In this case the chance of collision is sufficiently low (at least if the token is long enough), that it can probably be ignored.
See Why passwords should be hashed and How to safely store a password for some nice writeups of cryptographic hashing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With