Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I create custom sequence in PostgreSQL based on date of row creation?

I am in the process of replacing a legacy order management application for my employer. One of the specs for the new system is that the order numbering system remain in place. Right now, our order numbers are formatted like so:

  • The first four digits are the current year
  • The next two digits are the current month
  • The next (and last) four digits are a counter that increments by one each time an order is placed in that month.

For example, the first order placed in June 2014 would have order number 2014060001. The next order placed would have order number 2014060002 and so on.

This order number will need to be the primary ID in the Orders table. It appears that I need to set a custom sequence for PostgreSQL to use to assign the primary key, however the only documentation I can find for creation of custom sequences is very basic (how to increment by two instead of one, etc.).

How do I create a custom sequence based on the date as described above?

like image 221
rockusbacchus Avatar asked Jun 10 '14 01:06

rockusbacchus


2 Answers

You can set your manually created sequence to a specific value using the EXTRACT() function:

setval('my_sequence',
  (EXTRACT(YEAR FROM now())::integer * 1000000) +
  (EXTRACT(MONTH FROM now())::integer * 10000)
);

The next order entered will take the next value in the sequence, i.e. YYYYMM0001 etc.

The trick is when to update the sequence value. You could do it the hard way inside PG and write a BEFORE INSERT trigger on your orders table that checks if this is the first record in a new month:

CREATE FUNCTION before_insert_order() RETURNS trigger AS $$
DECLARE
  base_val  integer;
BEGIN
  -- base_val is the minimal value of the sequence for the current month: YYYYMM0000
  base_val := (EXTRACT(YEAR FROM now())::integer * 1000000) +
              (EXTRACT(MONTH FROM now())::integer * 10000);

  -- So if the sequence is less, then update it
  IF (currval('my_sequence') < base_val)
    setval('my_sequence', base_val);
  END IF;

  -- Now assign the order id and continue with the insert
  NEW.id := nextval('my_sequence');
  RETURN NEW;
END; $$ LANGUAGE plpgsql;

CREATE TRIGGER tr_bi_order
  BEFORE INSERT ON order_table
  FOR EACH ROW EXECUTE PROCEDURE before_insert_order();

Why is this the hard way? Because you check the value of the sequence on every insert. If you have only a few inserts per day and your system is not very busy, this is a viable approach.

If you cannot spare all those CPU cycles you could schedule a cron job to run at 00:00:01 of every first day of the month to execute a PG function via psql to update the sequence and then just use the sequence as a default value for new order records (so no trigger needed).

like image 103
Patrick Avatar answered Oct 15 '22 02:10

Patrick


Another idea, which I would prefer is this,

  1. Create a function which generates your id from from the timestamp and your invoice number,
  2. Create regular table with,
    • a foo_id: simple sequence (incrementing int)
    • a ts_created field.
  3. Generate your invoice ids on the query when required,

Here is how it looks, first we create the function to generate an acme_id from a bigint and a timestamp

CREATE FUNCTION acme_id( seqint bigint, seqts timestamp with time zone )
RETURNS char(10)
AS $$
  SELECT format(
    '%04s%02s%04s',
    EXTRACT(year FROM seqts),
    EXTRACT(month from seqts),
    to_char(seqint, 'fm0000')
  );
$$ LANGUAGE SQL
IMMUTABLE;

And then we create a table.

CREATE TABLE foo (
  foo_id        int  PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  data          text,
  ts_created    timestamp with time zone DEFAULT NOW()
);
CREATE INDEX ON foo(ts_created, foo_id);

Now you can generate what you're looking for with a simple window function.

SELECT acme_id(
  ROW_NUMBER() OVER (
    PARTITION BY date_trunc('MONTH', ts_created)
    ORDER BY ts_created
  ),
  ts_created
), *
FROM foo;

I would build my system such that the foo_id is used internally. So long as you don't have deletions from foo you'll always be able to render the same invoice id from the row, you just won't have to store it.

You can even cache the rendering and invoice ids with a [materialized] view.

CREATE MATERIALIZED VIEW acme_invoice_view
AS
    SELECT acme_id(
      ROW_NUMBER() OVER (
        PARTITION BY date_trunc('MONTH', ts_created)
        ORDER BY ts_created
      ),
      ts_created
    ), *
    FROM foo;
;

SELECT * FROM acme_invoice_view;
  acme_id   | foo_id | insert_date | data 
------------+--------+-------------+------
 2021100001 |      1 | 2021-10-12  | bar
(1 row)

Keep in mind the drawbacks to this approach:

  • Rows in the invoice table can never be deleted, (you could add a bool to deactivate them),
  • The foo_id and ts_created should be immutable (never updated) or you may get a new Invoice ID. Surrogate keys (foo_id should never change by definition anyway).

The benefits of this approach:

  • Storing a real timestamp which is likely very useful on an invoice
  • Real surrogate key (which I would use in all contexts instead of an invoice ID), simplifies linking to other tables and is more efficient and fast.
  • Single source of truth for the invoice date
  • Easy to issue a new invoice-id scheme and to even map it to an older scheme.
like image 28
NO WAR WITH RUSSIA Avatar answered Oct 15 '22 02:10

NO WAR WITH RUSSIA