Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres partition by week

I can imagine table partition by a date (in particular for logs) is something widely used, but I am not able to find a good answer to my problem.

I want to create a table partition by week (the number of records is to big to make it monthly). The reason for it to be weekly is that I need the data for an algorithm that will be looking for dates as part of the process.

My problem is I want it to create the partitions considering the week and using the "typical" approach I would have to be creating it manually. Something like this.

    CREATE TABLE measurement_y2013w01 (
CHECK ( logdate >= DATE '2013-01-07' AND logdate < DATE '2013-01-14' ) 
    ) INHERITS (measurement);

    CREATE TABLE measurement_y2006w02 (
CHECK ( logdate >= DATE '2013-01-14' AND logdate < DATE '2013-01-21' ) 
    ) INHERITS (measurement);

... 

But I want it to be made automatically. I don't want to be creating a partition for every week one by one.

My rule for name giving would be yYYYYwWW for the naming of the partition or start datadYYYYMMDD.

I thought in checking for partitions when inserting using something like this:

 SELECT
    nmsp_parent.nspname AS parent_schema,
    parent.relname      AS parent,
    nmsp_child.nspname  AS child,
    child.relname       AS child_schema
FROM pg_inherits
    JOIN pg_class parent        ON pg_inherits.inhparent = parent.oid
    JOIN pg_class child         ON pg_inherits.inhrelid   = child.oid
    JOIN pg_namespace nmsp_parent   ON nmsp_parent.oid  = parent.relnamespace
    JOIN pg_namespace nmsp_child    ON nmsp_child.oid   = child.relnamespace

and if the partition is not there create it before the insert, but this seams a lot inefficient, considering the number of records inserted.

My other alternative is to have an external process running weekly creating this partition, but I was trying to avoid this.

Is there a more efficient solution for this that I am missing, that is used, for example, for monthly checks?

like image 501
RGPT Avatar asked Apr 17 '13 00:04

RGPT


2 Answers

Ok, so let's create ourselves a function to handle that!

CREATE OR REPLACE FUNCTION create_partition_measurement( DATE, DATE )
returns void AS $$
DECLARE
create_query text;
BEGIN
    FOR create_query IN SELECT

        'CREATE TABLE measurement_' || TO_CHAR( d, 'YYYY_WW' ) || ' (
        CHECK ( EXTRACT(YEAR FROM logdate) = EXTRACT(YEAR FROM TIMESTAMP ''' || d || ''') AND EXTRACT(WEEK FROM logdate) = EXTRACT(WEEK FROM TIMESTAMP ''' || d || ''') )
        ) INHERITS (measurement);'

        FROM generate_series( $1, $2, '1 week' ) AS d LOOP

        EXECUTE create_query;

    END LOOP;
END;
$$
language plpgsql;

With this you can now call something like

SELECT create_partition_measurement ('2015/02/08','2015/03/01'); 

and have your partitions created. First step to automation, done.

I tested all this in my own database using the following test table:

CREATE TABLE measurement (id INT NOT NULL PRIMARY KEY, id_user INT NOT NULL, logdate TIMESTAMP NOT NULL);

After creating the partitions using the function above, I was able to:

  • Insert data into the correct partition;
  • Get an error when trying to insert data from one week into another week's partition;
  • Automatically create partition for several weeks, and;
  • Get an error should I try to create a partition for a week where one already exists.

This should be enough =)

Now, about automating the creation process. I use a simple cron script to call this function for me every month and a couple of monitoring scripts to make sure everything is working as it should. The cron looks like this:

0 0 1 * * /var/lib/postgresql/create_partitions.sh

And the script will run the command using the current date and the current date + 1 month. It looks something like this:

startDate=`date "+%Y/%m/%d"`
endDate=`date -u -d "+1 month -$(date +%d) days" "+%Y/%m/%d"
psql -U "$dbUser" -w -c "SELECT create_partition_measurement('$startDate','$endDate');"

If you need to include indexes, PKs, FKs in the table, or help with the triggers to make all this work, just let me know.

like image 117
Lucas Leite Avatar answered Oct 02 '22 10:10

Lucas Leite


You can use date_trunc function to round data value to the first day of the week. For partition naming you could use year and week number in the year YYWW:

CREATE TABLE measurement_1301 (
    CHECK ( date_trunc( 'week', logdate )::date = DATE '2013-01-07') )
    INHERITS (measurement);

CREATE TABLE measurement_1302 (
    CHECK ( date_trunc( 'week', logdate )::date = DATE '2013-01-14') )
    INHERITS (measurement);

CREATE TABLE measurement_1303 (
    CHECK ( date_trunc( 'week', logdate )::date = DATE '2013-01-21') )
    INHERITS (measurement);

-- Default partition:
CREATE TABLE measurement_default () INHERITS (measurement);

For partition name generation use to_char( logdate::date, 'YYWW') and if you if you prefer yYYYYwWW: to_char( logdate::date, '"y"YYYY"w"WW')

And to check the existing partitions you could use very simple query:

SELECT relname FROM pg_class
WHERE relname ~ '^measurement_[0-9]{4}$'
ORDER BY RIGHT(relname,4) DESC

Data routing trigger inserts into appropriate partition and fall back to the default, if there is no partition for the given week.

CREATE OR REPLACE FUNCTION measurement_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN

    IF to_char( NEW.logdate::date, 'YYWW') = '1301' THEN
        INSERT INTO measurement_1301 VALUES (NEW.*);
    ELSIF to_char( NEW.logdate::date, 'YYWW') = '1302' THEN
        INSERT INTO measurement_1302 VALUES (NEW.*);
    ELSIF to_char( NEW.logdate::date, 'YYWW') = '1303' THEN
        INSERT INTO measurement_1303 VALUES (NEW.*);
    -- ...
    ELSE
      INSERT INTO measurement_default VALUES (NEW.*);
    END IF;
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER measurement_insert_tr BEFORE INSERT ON measurement
FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger()

You will have create in advance all partitions. Or you can use the default partition and time to time re-partition the data stored there, creating new partitions and adjusting the insert trigger.

PS you can find scripts for partitioning solution based on triggers here http://hg.nowitworks.eu/postgresql-triggers-and-partitions

like image 25
nad2000 Avatar answered Oct 02 '22 10:10

nad2000