I'm building a web application where the front end is a highly-specialized search engine. Searching is handled at the main URL, and the user is passed off to a sub-directory when they click on a search result for a more detailed display. This hand-off is being done as a GET request with the primary key being passed in the query string. I seem to recall reading somewhere that exposing primary keys to the user was not a good idea, so I decided to implement reversible encryption.
I'm starting to wonder if I'm just being paranoid. The reversible encryption (base64) is probably easily broken by anybody who cares to try, makes the URLs very ugly, and also longer than they otherwise would be. Should I just drop the encryption and send my primary keys in the clear?
Primary keys must contain UNIQUE values, and cannot contain NULL values. A table can have only ONE primary key; and in the table, this primary key can consist of single or multiple columns (fields).
Primary keys should never be exposed, even UUIDs It is, therefore, an obvious thing to use as a customer number, or in a URL to identify a unique page or row.
Solution: How to avoid exposing the PK You still use a numerical, auto-incremented, value for the PK and FK, but attach to it, in a separate column, a UUID. Whenever you are dealing internally, in the same application/system with the data, the numerical primary key is used.
The primary key should be numeric or date (avoid the use of text data types). Because it takes longer for the database to compare string values than numeric values, using text columns as primary keys is likely to degrade query performance. The primary key should be compact (avoid the use of long data types).
What you're doing is basically obfuscation. A reversible encrypted (and base64 doesn't really count as encryption) primary key is still a primary key.
What you were reading comes down to this: you generally don't want to have your primary keys have any kind of meaning outside the system. This is called a technical primary key rather than a natural primary key. That's why you might use an auto number field for Patient ID rather than SSN (which is called a natural primary key).
Technical primary keys are generally favoured over natural primary keys because things that seem constant do change and this can cause problems. Even countries can come into existence and cease to exist.
If you do have technical primary keys you don't want to make them de facto natural primary keys by giving them meaning they didn't otherwise have. I think it's fine to put a primary key in a URL but security is a separate topic. If someone can change that URL and get access to something they shouldn't have access to then it's a security problem and needs to be handled by authentication and authorization.
Some will argue they should never be seen by users. I don't think you need to go that far.
On the dangers of exposing your primary key, you'll want to read "autoincrement considered harmful", By Joshua Schachter.
URLs that include an identifier will let you down for three reasons.
The first is that given the URL for some object, you can figure out the URLs for objects that were created around it. This exposes the number of objects in your database to possible competitors or other people you might not want having this information (as famously demonstrated by the Allies guessing German tank production levels by looking at the serial numbers.)
Secondly, at some point some jerk will get the idea to write a shell script with a for-loop and try to fetch every single object from your system; this is definitely no fun.
Finally, in the case of users, it allows people to derive some sort of social hierarchy. Witness the frequent hijacking and/or hacking of high-prestige low-digit ICQ ids.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With