Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL: going to run out of IDs in integer columns

Problem

We are building a web application in Java on top of PostgreSQL. It's quite large and successful, and it should be able to run for a few more years at least.

Unfortunately, we (well, I) made a grave mistake in the early stages of the design process: all the database IDs are integers, handed out from one shared sequence.

Java's max int is 2^31-1, so approximately 2 billions. The same goes for PostgreSQL's integer type. The system is currently eating up ~10k IDs every day, and the rate is going up as we gain new users.

One day, the IDs will run out and overflow.

Question

We are looking for ways to fix the situation. Let's get the obvious one out of the way immediately: switching to Java's long and Postgres' bigint is a clean solution, but it's a ton of work. We need to postpone it as much as possible.

Some ideas we've had so far:

  • Don't use one sequence for everything, give each table its own sequence.
    • Pros: this gives us up to N times more time, where N is the number of tables.
    • Cons: we like the fact that every row has a unique ID.
  • Stop using sequence IDs for some of the tables. For example, a table with customer events doesn't really need an ID: customer, timestamp is a perfectly valid primary key.
    • Pros: some of our biggest ID-hogs could be changed this way.
    • Cons: nontrivial amount of work.
  • Stop wasting IDs on empty records. This happens with some sub-tables, like customer contact information. Having the record always present makes the code simpler, but it means many customers have an empty contact information record stored.
    • Pros: some of our biggest ID-hogs could be fixed this way.
    • Cons: we lose the code simplicity.
  • Every new table must use long / bigint with a new sequence.
    • Pros: at least we don't make it worse.
    • Cons: the contact surfaces with the rest of the code will be ugly.

Under these constraints, what other approaches will delay the ID depletion?

like image 473
Matej Avatar asked Feb 26 '16 14:02

Matej


2 Answers

Switching to long is far from a clean solution. There's only one sensible option if you grow too large: UUIDs (yes, PostgreSQL comes with uuid data type).

At 128 bits its the size of 4 integers, but you don't want to go through whole application in few years and to all this again, do you? UUIDs will work when you grow too large and you need to shard your data. You won't be able to have a shared sequence then, that's why UUIDs make sense.

As a bonus you can even keep your unique property on each row.


Migration isn't that hard: adding a column with NULL in PostgreSQL is cheap, so you can first add a column and then do online migration in batches, where you update a few thousand records at a time, so you don't have downtime.

Then you can test the same code with both foreign keys. Does Java have anything similar to laboratory or scientist?

Will it be a ton of work? Yes, but that's obviously a good sign, if you have an application that's so popular.

I also hope you've learned a lesson with using the same sequence for all the tables. Honestly - I don't really see added value in that. If you want to know where an object fits, you can also name the primary keys differently (eg. room_id, reservation_id, etc.).

like image 136
hruske Avatar answered Oct 31 '22 02:10

hruske


Since asking this question, I have found a nice way to fix a half of the problem - the database side. So, for the future generations, here is the way to do it.

  1. Find all the DB columns of type integer or integer[]. Check the results manually and remove columns of type, for example, text[].

    SELECT *
    FROM information_schema.columns cls
    JOIN information_schema.tables tbl ON cls.table_name = tbl.table_name
    WHERE
      cls.table_schema = '<my schema>'
      AND cls.data_type = 'integer' OR cls.data_type = 'ARRAY'
      AND tbl.table_type = 'BASE TABLE';
    
  2. Prepare a data type change DDL for each of those columns:

    ALTER TABLE <one of the tables found> 
    ALTER COLUMN <one of its integral columns> 
    TYPE bigint;
    
  3. This works beautifully, except for the VIEWs: they don't like me changing their return types. I need to re-create all of them - the sequence will be

    1. Drop all the views.
    2. Alter column types.
    3. Re-create all the views.
  4. Stop the application, run the upgrade script from step 3, fix slow queries by running VACUUM and ANALYZE on all the tables.
  5. Run tests and fix issues in the source code - for example, bigint[] cannot be cast to integer[].

How to export/backup VIEWs only?

like image 33
Matej Avatar answered Oct 31 '22 02:10

Matej