I would like to replace some of the sequences I use for id's in my postgresql db with my own custom made id generator. The generator would produce a random number with a checkdigit at the end. So this: <pre class="prettyprint"><code>SELECT nextval('customers') </code></pre> would be replaced by something like this: <pre class="prettyprint"><code>SELECT get_new_rand_id('customer') </code></pre> The function would then return a numerical value such as: <code>[1-9][0-9]{9}</code> where the last digit is a checksum. The concerns I have is: <ol> <li>How do I make the thing atomic</li> <li>How do I avoid returning the same id twice (this would be caught by trying to insert it into a column with unique constraint but then its to late to I think)</li> <li>Is this a good idea at all?</li> </ol> Note1: I do not want to use uuid since it is to be communicated with customers and 10 digits is far simpler to communicate than the 36 character uuid. Note2: The function would rarely be called with <code>SELECT get_new_rand_id()</code> but would be assigned as default value on the id-column instead of <code>nextval()</code>. EDIT: Ok, good discussusion below! Here are some explanation for why: <ol> <li> So why would I over-comlicate things this way? The purpouse is to hide the primary key from the customers. <blockquote> I give each new customer a unique customerId (generated serial number in the db). Since I communicate that number with the customer it is a fairly simple task for my competitors to monitor my business (there are other numbers such as invoice nr and order nr that have the same properties). It is this monitoring I would like to make a little bit harder (note: not impossible but harder). </blockquote> </li> <li> Why the check digit? <blockquote> Before there was any talk of hiding the serial nr I added a checkdigit to ordernr since there were klumbsy fingers at some points in the production, and my thought was that this would be a good practice to keep in the future. </blockquote> </li> </ol> After reading the discussion I can certainly see that my approach is not the best way to solve my problem, but I have no other good idea of how to solve it, so please help me out here. <ol> <li>Should I add an extra column where I put the id I expose to the customer and keep the serial as primary key?</li> <li>How can I generate the id to expose in a sane and efficient way?</li> <li>Is the checkdigit necessary?</li> </ol>

For generating unique and random-looking identifiers from a serial, using ciphers might be a good idea. Since their output is bijective (there is a one-to-one mapping between input and output values) -- you will not have any collisions, unlike hashes. Which means your identifiers don't have to be as long as hashes. Most cryptographic ciphers work on 64-bit or larger blocks, but the PostgreSQL wiki has an example PL/pgSQL procedure for a "non-cryptographic" cipher function that works on (32-bit) <code>int</code> type. Disclaimer: I have not tried using this function myself. To use it for your primary keys, run the CREATE FUNCTION call from the wiki page, and then on your empty tables do: <pre class="prettyprint"><code>ALTER TABLE foo ALTER COLUMN foo_id SET DEFAULT pseudo_encrypt(nextval('foo_foo_id_seq')::int); </code></pre> And voila! <pre class="prettyprint"><code>pg=> insert into foo (foo_id) values(default); pg=> insert into foo (foo_id) values(default); pg=> insert into foo (foo_id) values(default); pg=> select * from foo; foo_id ------------ 1241588087 1500453386 1755259484 (4 rows) </code></pre>

I think you are way over-complicating this. Why not let the database do what it does best and let it take care of atomicity and ensuring that the same id is not used twice? Why not use a postgresql SERIAL type and get an autogenerated surrogate primary key, just like an integer IDENTITY column in SQL Server or DB2? Use that on the column instead. Plus it will be faster than your user-defined function. I concur regarding hiding this surrogate primary key and using an exposed secondary key (with a unique constraint on it) to lookup clients in your interface. Are you using a sequence because you need a unique identifier across several tables? This is usually an indication that you need to rethink your table design, and those several tables should perhaps be combined into one, with an autogenerated surrogate primary key. Also see here

Replacing sequence with random number

Tags:

sql

random

postgresql

I would like to replace some of the sequences I use for id's in my postgresql db with my own custom made id generator. The generator would produce a random number with a checkdigit at the end. So this:

Click to copy

SELECT nextval('customers')

would be replaced by something like this:

Click to copy

SELECT get_new_rand_id('customer')

The function would then return a numerical value such as: [1-9][0-9]{9} where the last digit is a checksum.

The concerns I have is:

How do I make the thing atomic
How do I avoid returning the same id twice (this would be caught by trying to insert it into a column with unique constraint but then its to late to I think)
Is this a good idea at all?

Note1: I do not want to use uuid since it is to be communicated with customers and 10 digits is far simpler to communicate than the 36 character uuid.

Note2: The function would rarely be called with SELECT get_new_rand_id() but would be assigned as default value on the id-column instead of nextval().

EDIT: Ok, good discussusion below! Here are some explanation for why:

So why would I over-comlicate things this way? The purpouse is to hide the primary key from the customers.

I give each new customer a unique customerId (generated serial number in the db). Since I communicate that number with the customer it is a fairly simple task for my competitors to monitor my business (there are other numbers such as invoice nr and order nr that have the same properties). It is this monitoring I would like to make a little bit harder (note: not impossible but harder).
Why the check digit?

Before there was any talk of hiding the serial nr I added a checkdigit to ordernr since there were klumbsy fingers at some points in the production, and my thought was that this would be a good practice to keep in the future.

After reading the discussion I can certainly see that my approach is not the best way to solve my problem, but I have no other good idea of how to solve it, so please help me out here.

Should I add an extra column where I put the id I expose to the customer and keep the serial as primary key?
How can I generate the id to expose in a sane and efficient way?
Is the checkdigit necessary?

345

asked Nov 06 '09 12:11

UlfR

4 Answers

For generating unique and random-looking identifiers from a serial, using ciphers might be a good idea. Since their output is bijective (there is a one-to-one mapping between input and output values) -- you will not have any collisions, unlike hashes. Which means your identifiers don't have to be as long as hashes.

Most cryptographic ciphers work on 64-bit or larger blocks, but the PostgreSQL wiki has an example PL/pgSQL procedure for a "non-cryptographic" cipher function that works on (32-bit) int type. Disclaimer: I have not tried using this function myself.

To use it for your primary keys, run the CREATE FUNCTION call from the wiki page, and then on your empty tables do:

Click to copy

ALTER TABLE foo ALTER COLUMN foo_id SET DEFAULT pseudo_encrypt(nextval('foo_foo_id_seq')::int);

And voila!

Click to copy

pg=> insert into foo (foo_id) values(default);
pg=> insert into foo (foo_id) values(default);
pg=> insert into foo (foo_id) values(default);
pg=> select * from foo;
  foo_id   
------------
 1241588087
 1500453386
 1755259484
(4 rows)

answered Oct 05 '22 01:10

intgr

I added my comment to your question and then realized that I should have explained myself better... My apologies.

You could have a second key - not the primary key - that is visible to the user. That key could use the primary as the seed for the hash function you describe and be the one that you use to do lookups. That key would be generated by a trigger after insert (which is much simpler than trying to ensure atomicity of the operation) and

That is the key that you share with your clients, never the PK. I know there is debate (albeit, I can't understand why) if PKs are to be invisible to the user applications or not. The modern database design practices, and my personal experience, all seem to suggest that PKs should NOT be visible to users. They tend to attach meaning to them and, over time, that is a very bad thing - regardless if they have a check digit in the key or not.

Your joins will still be done using the PK. This other generated key is just supposed to be used for client lookups. They are the face, the PK is the guts.

Hope that helps.

Edit: FWIW, there is little to be said about "right" or "wrong" in database design. Sometimes it boils down to a choice. I think the choice you face will be better served by leaving the PK alone and creating a secondary key - just that.

answered Oct 05 '22 00:10

cethegeek

I think you are way over-complicating this. Why not let the database do what it does best and let it take care of atomicity and ensuring that the same id is not used twice? Why not use a postgresql SERIAL type and get an autogenerated surrogate primary key, just like an integer IDENTITY column in SQL Server or DB2? Use that on the column instead. Plus it will be faster than your user-defined function.

I concur regarding hiding this surrogate primary key and using an exposed secondary key (with a unique constraint on it) to lookup clients in your interface.

Are you using a sequence because you need a unique identifier across several tables? This is usually an indication that you need to rethink your table design, and those several tables should perhaps be combined into one, with an autogenerated surrogate primary key.

Also see here

answered Oct 05 '22 00:10

Allen

How you generate the random and unique ids is a useful question - but you seem to be making a counter productive assumption about when to generate them!

My point is that you do not need to generate these id's at the time of creating your rows, because they are essentially independent of the data being inserted.

What I do is pre-generate random id's for future use, that way I can take my own sweet time and absolutely guarantee they are unique, and there's no processing to be done at the time of the insert.

For example I have an orders table with order_id in it. This id is generated on the fly when the user enters the order, incrementally 1,2,3 etc forever. The user does not need to see this internal id.

Then I have another table - random_ids with (order_id, random_id). I have a routine that runs every night which pre-loads this table with enough rows to more than cover the orders that might be inserted in the next 24 hours. (If I ever get 10000 orders in one day I'll have a problem - but that would be a good problem to have!)

This approach guarantees uniqueness and takes any processing load away from the insert transaction and into the batch routine, where it does not affect the user.

answered Oct 05 '22 01:10

Rob Beer

Related questions
                            
                                SQL Server FILESTREAM limitation
                            
                                Performance of COUNT SQL function
                            
                                oracle date range
                            
                                Is it possible to create a global stored procedure at Sql server level
                            
                                How to get one day ahead of a given date?
                            
                                Convert nvarchar to bigint in Sql server 2008
                            
                                SQL SUM and CASE and DISTINCT
                            
                                What would be the right steps for horizontal partitioning in Postgresql?
                            
                                Query a SQL field for whether it contains any one of multiple values
                            
                                python mysql delete statement not working
                            
                                How to attach MDF with no log file? [closed]
                            
                                Update from Temp Table
                            
                                Least value but not NULL in Oracle SQL
                            
                                SQL query for Courses Enrolment on Moodle
                            
                                How to add offset in a "select" query in Oracle 11g?
                            
                                Optimize query with OFFSET on large table
                            
                                SQL add filter only if a variable is not null
                            
                                How to calculate Session and Session duration in Firebase Analytics raw data?
                            
                                SQL Convert Milliseconds to Days, Hours, Minutes
                            
                                Postgres: user mapping not found for "postgres"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Replacing sequence with random number

Tags:

sql

random

postgresql

UlfR

People also ask

4 Answers

intgr

cethegeek

Allen

Rob Beer

Recent Activity

Donate For Us