Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hashes vs Numeric id's

When creating a web application that some how displays the display of a unique identifier for a recurring entity (videos on YouTube, or book section on a site like mine), would it be better to use a uniform length identifier like a hash or the unique key of the item in the database (1, 2, 3, etc).

Besides revealing a little, what I think is immaterial, information about the internals of your app, why would using a hash be better than just using the unique id?

In short: Which is better to use as a publicly displayed unique identifier - a hash value, or a unique key from the database?

Edit: I'm opening up this question again because Dmitriy brought up the good point of not tying down the naming to db specific property. Will this sort of tie down prevent me from optimizing/normalizing the database in the future?

The platform uses php/python with ISAM /w MySQL.

like image 437
Karan Avatar asked Oct 13 '08 04:10

Karan


People also ask

What are hashed IDS?

A transaction hash/id is a unique string of characters that is given to every transaction that is verified and added to the blockchain. In many cases, a transaction hash is needed in order to locate funds.

Should Database IDS be string or int?

You are doing the right thing - identity field should be numeric and not string based, both for space saving and for performance reasons (matching keys on strings is slower than matching on integers). -1: Integers do not make a good ID.


3 Answers

Unless you're trying to hide the state of your internal object ID counter, hashes are needlessly slow (to generate and to compare), needlessly long, needlessly ugly, and needlessly capable of colliding. GUIDs are also long and ugly, making them just as unsuitable for human consumption as hashes are.

For inventory-like things, just use a sequential (or sharded) counter instead. If you migrate to a different database, you will just have to initialize the new counter to a value at least as large as your largest existing record ID. Pretty much every database server gives you a way to do this.

If you are trying to hide the state of your counter, perhaps because you're counting users and don't want competitors to know how many you have, I suggest avoiding the display of your internal IDs. If you insist on displaying them and don't want the drawbacks of a hash, you might consider using a maximal-period linear feedback shift register to generate IDs.

like image 200
ʇsәɹoɈ Avatar answered Sep 23 '22 20:09

ʇsәɹoɈ


I typically use hashes if I don't want the user to be able to guess the next ID in the series. But for your book sections, I'd stick with numerical id's.

like image 42
Xenph Yan Avatar answered Sep 23 '22 20:09

Xenph Yan


Using hashes is preferable in case you need to rebuild your database for some reason, for example, and the ordering changes. The ordinal numbers will move around -- but the hashes will stay the same.

Not relying on the order you put things into a box, but on properties of the things, just seems.. safer.

But watch out for collisions, obviously.

like image 36
SquareCog Avatar answered Sep 25 '22 20:09

SquareCog