I can imagine table partition by a date (in particular for logs) is something widely used, but I am not able to find a good answer to my problem.
I want to create a table partition by week (the number of records is to big to make it monthly). The reason for it to be weekly is that I need the data for an algorithm that will be looking for dates as part of the process.
My problem is I want it to create the partitions considering the week and using the "typical" approach I would have to be creating it manually. Something like this.
CREATE TABLE measurement_y2013w01 (
CHECK ( logdate >= DATE '2013-01-07' AND logdate < DATE '2013-01-14' )
) INHERITS (measurement);
CREATE TABLE measurement_y2006w02 (
CHECK ( logdate >= DATE '2013-01-14' AND logdate < DATE '2013-01-21' )
) INHERITS (measurement);
...
But I want it to be made automatically. I don't want to be creating a partition for every week one by one.
My rule for name giving would be yYYYYwWW for the naming of the partition or start datadYYYYMMDD.
I thought in checking for partitions when inserting using something like this:
SELECT
nmsp_parent.nspname AS parent_schema,
parent.relname AS parent,
nmsp_child.nspname AS child,
child.relname AS child_schema
FROM pg_inherits
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class child ON pg_inherits.inhrelid = child.oid
JOIN pg_namespace nmsp_parent ON nmsp_parent.oid = parent.relnamespace
JOIN pg_namespace nmsp_child ON nmsp_child.oid = child.relnamespace
and if the partition is not there create it before the insert, but this seams a lot inefficient, considering the number of records inserted.
My other alternative is to have an external process running weekly creating this partition, but I was trying to avoid this.
Is there a more efficient solution for this that I am missing, that is used, for example, for monthly checks?
Ok, so let's create ourselves a function to handle that!
CREATE OR REPLACE FUNCTION create_partition_measurement( DATE, DATE )
returns void AS $$
DECLARE
create_query text;
BEGIN
FOR create_query IN SELECT
'CREATE TABLE measurement_' || TO_CHAR( d, 'YYYY_WW' ) || ' (
CHECK ( EXTRACT(YEAR FROM logdate) = EXTRACT(YEAR FROM TIMESTAMP ''' || d || ''') AND EXTRACT(WEEK FROM logdate) = EXTRACT(WEEK FROM TIMESTAMP ''' || d || ''') )
) INHERITS (measurement);'
FROM generate_series( $1, $2, '1 week' ) AS d LOOP
EXECUTE create_query;
END LOOP;
END;
$$
language plpgsql;
With this you can now call something like
SELECT create_partition_measurement ('2015/02/08','2015/03/01');
and have your partitions created. First step to automation, done.
I tested all this in my own database using the following test table:
CREATE TABLE measurement (id INT NOT NULL PRIMARY KEY, id_user INT NOT NULL, logdate TIMESTAMP NOT NULL);
After creating the partitions using the function above, I was able to:
This should be enough =)
Now, about automating the creation process. I use a simple cron script to call this function for me every month and a couple of monitoring scripts to make sure everything is working as it should. The cron looks like this:
0 0 1 * * /var/lib/postgresql/create_partitions.sh
And the script will run the command using the current date and the current date + 1 month. It looks something like this:
startDate=`date "+%Y/%m/%d"`
endDate=`date -u -d "+1 month -$(date +%d) days" "+%Y/%m/%d"
psql -U "$dbUser" -w -c "SELECT create_partition_measurement('$startDate','$endDate');"
If you need to include indexes, PKs, FKs in the table, or help with the triggers to make all this work, just let me know.
You can use date_trunc function to round data value to the first day of the week. For partition naming you could use year and week number in the year YYWW:
CREATE TABLE measurement_1301 (
CHECK ( date_trunc( 'week', logdate )::date = DATE '2013-01-07') )
INHERITS (measurement);
CREATE TABLE measurement_1302 (
CHECK ( date_trunc( 'week', logdate )::date = DATE '2013-01-14') )
INHERITS (measurement);
CREATE TABLE measurement_1303 (
CHECK ( date_trunc( 'week', logdate )::date = DATE '2013-01-21') )
INHERITS (measurement);
-- Default partition:
CREATE TABLE measurement_default () INHERITS (measurement);
For partition name generation use to_char( logdate::date, 'YYWW')
and if you if you prefer yYYYYwWW: to_char( logdate::date, '"y"YYYY"w"WW')
And to check the existing partitions you could use very simple query:
SELECT relname FROM pg_class
WHERE relname ~ '^measurement_[0-9]{4}$'
ORDER BY RIGHT(relname,4) DESC
Data routing trigger inserts into appropriate partition and fall back to the default, if there is no partition for the given week.
CREATE OR REPLACE FUNCTION measurement_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF to_char( NEW.logdate::date, 'YYWW') = '1301' THEN
INSERT INTO measurement_1301 VALUES (NEW.*);
ELSIF to_char( NEW.logdate::date, 'YYWW') = '1302' THEN
INSERT INTO measurement_1302 VALUES (NEW.*);
ELSIF to_char( NEW.logdate::date, 'YYWW') = '1303' THEN
INSERT INTO measurement_1303 VALUES (NEW.*);
-- ...
ELSE
INSERT INTO measurement_default VALUES (NEW.*);
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER measurement_insert_tr BEFORE INSERT ON measurement
FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger()
You will have create in advance all partitions. Or you can use the default partition and time to time re-partition the data stored there, creating new partitions and adjusting the insert trigger.
PS you can find scripts for partitioning solution based on triggers here http://hg.nowitworks.eu/postgresql-triggers-and-partitions
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With