Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What would be the right steps for horizontal partitioning in Postgresql?

We have an E-commerce portal with a Postgresql 9.1 database. One very important table has at the moment 32 million records. If we want to deliver all items this table would grow to 320 million records, mostly dates. Which would be to heavy.

So we are thinking about horizontal partitioning / sharding. We can divide items in this table into 12 pieces horizontal (1 per month). What would be the best steps and technics to do so? Would horizontal partitioning within the database be good enough or do we have to start thinking about sharding?

like image 614
Brambo76 Avatar asked Apr 21 '12 06:04

Brambo76


People also ask

Which is a type of horizontal partitioning?

Two common types of horizontal partitioning are range and hash partitioning. On the other hand, vertical partitioning allows a table to be partitioned into disjoint sets of columns.

What is horizontal data partitioning?

Horizontal partitioning (often called sharding). In this strategy, each partition is a separate data store, but all partitions have the same schema. Each partition is known as a shard and holds a specific subset of the data, such as all the orders for a specific set of customers.


2 Answers

While 320 million is not small, it's not really huge either.

It largely depends on the queries you run on the table. If you always include the partition key in your queries then "regular" partitioning would probably work.

An example for this can be found in the PostgreSQL wiki:
http://wiki.postgresql.org/wiki/Month_based_partitioning

The manual also explains some of the caveats of partitioning:
http://www.postgresql.org/docs/current/interactive/ddl-partitioning.html

If you are thinking about sharding, you might read how Instagram (which is powered by PostgreSQL) has implemented that:

http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram

If you have mostly read-queries, another option might be to use streaming replication to setup multiple servers and distribute the read queries by connecting to the hot-standby for read access and connecting to the master for write access. I think pg-pool II can do that (somewhat) automatically. That can be combined with partitioning to further reduce the query runtime.

If you are adventurous and don't have very immediate needs to do so, you might also consider Postgres-XC which promises to support transparent horizontal scaling:
http://postgres-xc.sourceforge.net/

There is no final release yet, but it looks like this isn't taking too long

like image 139
a_horse_with_no_name Avatar answered Nov 01 '22 04:11

a_horse_with_no_name


Here is my sample code for partitioning: t_master is a view to be select/insert/update/delete in your application t_1 and t_2 is the underlying tables actually storing the data.

create or replace view t_master(id, col1)
as 
select id, col1 from t_1
union all
select id, col1 from t_2


CREATE TABLE t_1
(
  id bigint PRIMARY KEY,
  col1 text
);

CREATE TABLE t_2
(
  id bigint PRIMARY KEY,
  col1 text
);



CREATE OR REPLACE FUNCTION t_insert_partition_function()
returns TRIGGER AS $$
begin
raise notice '%s', 'hello';
    execute 'insert into t_'
        || ( mod(NEW.id, 2)+ 1 )
        || ' values ( $1, $2 )' USING NEW.id, NEW.col1 ;
    RETURN NULL;
end;
$$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION t_update_partition_function()
returns TRIGGER AS $$
begin
    raise notice '%s', 'hello';
    execute 'update t_'
        || ( mod(NEW.id, 2)+ 1 )
        || ' set id = $1, col1 = $2 where id = $1' 
        USING NEW.id, NEW.col1 ;
    RETURN NULL;
end;
$$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION t_delete_partition_function()
returns TRIGGER AS $$
begin
    raise notice '%s', 'hello';
    execute 'delete from t_'
        || ( mod(OLD.id, 2)+ 1 )
        || ' where id = $1' 
        USING OLD.id;
    RETURN NULL;
end;
$$
LANGUAGE plpgsql;



CREATE TRIGGER t_insert_partition_trigger instead of INSERT
ON t_master FOR each row 
execute procedure t_insert_partition_function();

CREATE TRIGGER t_update_partition_trigger instead of update
ON t_master FOR each row 
execute procedure t_update_partition_function();

CREATE TRIGGER t_delete_partition_trigger instead of delete
ON t_master FOR each row 
execute procedure t_delete_partition_function();
like image 26
TritonHo Avatar answered Nov 01 '22 03:11

TritonHo