I've been having this dilemma for a while and couldn't find any hints to it, although it seems that someone outha have done it already.
What I need is to replace sequential AUTO_INCREMENT (or equivalent) primary keys with criptographically secure (i.e. non-consecutive!) ids, but at the same time I want to keep the performance advantage of sequential PKs: guaranteed unused next ID, clusterability, etc.
A simple approach would seem to implement a cryptographic pseudo-random permutation generator to uniquely map the 2^N space to 2^N without collisions and with an initialisation vector (IV).
While this could be implemented externally, this does need to store and atomically access state (the permutation position or last id), which means implementing externally would be grossly inefficient (it's the equivalent of running a subsequent UPDATE table SET crypto_id = FN_CRYPTO(autoincrement_id) WHERE autoincrement_id=LAST_INSERT_ID()
for every INSERT
).
Do you know of any such implementation as described above in a database in commercial use?
Auto-increment should be used as a unique key when no unique key already exists about the items you are modelling. So for Elements you could use the Atomic Number or Books the ISBN number.
The advantages to using numeric, auto incremented primary keys are numerous, but the most impactful benefits are faster speed when performing queries and data-independence when searching through thousands of records which might contain frequently altered data elsewhere in the table.
Auto-increment allows a unique number to be generated automatically when a new record is inserted into a table. Often this is the primary key field that we would like to be created automatically every time a new record is inserted.
In SQL Server, you mark a column as an auto-increment column and SQL Server automatically generates new values for the column when you insert a new row. In Oracle, you create a sequence to generate new values for a column in your table, but there is no direct link between the sequence and the table or column.
While this could be implemented externally, this does need to store and atomically access state (the permutation position or last id), which means implementing externally would be grossly inefficient (it's the equivalent of running a subsequent
UPDATE table SET crypto_id = FN_CRYPTO(autoincrement_id) WHERE autoincrement_id=LAST_INSERT_ID()
You could use generated/virtual column to avoid running proposed UPDATE for every insert:
-- pseudocode
CREATE TABLE tab(
autoincrement_id INT AUTO_INCREMENT,
crypto_id <type> GENERATED ALWAYS AS (FN_CRYPTO(autoincrement_id)) STORED
);
-- SQL Server example, SHA function is an example and should be replaced
CREATE TABLE tab(
autoincrement_id INT IDENTITY(1,1),
crypto_id AS (HASHBYTES('SHA2_256',CAST(autoincrement_id AS NVARCHAR(MAX)))) PERSISTED
);
db<>fiddle demo
More info:
SQL Server computed columns
Computed / calculated / virtual / derived columns in PostgreSQL
Column Depending on other column
EDIT by Dinu
If you use SHA, don't forget to concatenate a secret salt to the autoincrement_id
; alternately, you could use i.e. AES128 to encrypt the autoincrement_id
with a secret password and IV.
Also worth noting: any DB user with access to the table DDL will have access to your secret salt/key/iv. If this is of concern to you, you can use a parameterized stored procedure i.e. FN_CRYPTO(id,key,iv)
instead and send them along with every insert.
To retrieve the crypto_id
on the app-side without needing a subsequent query, you would need to replicate the encryption function app-side to run on the returned autoincrement_id
. Note: if using autoincrement_id
as byte array for AES128, be very careful about endianness, it may differ DB and app-side. The only alternative is to use the OUTPUT
syntax of mssql, but that is specific to mssql and it requires running the ExecuteScalar
API instead of ExecuteNonQuery
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With