Billions rows in PostgreSql: partition or not to partition?

Tags:

partitioning

What i have:

Simple server with one xeon with 8 logic cores, 16 gb ram, mdadm raid1 of 2x 7200rpm drives.
PostgreSql
A lot of data to work with. Up to 30 millions of rows are being imported per day.
Time - complex queries can be executed up to an hour

Simplified schema of table, that will be very big:

id| integer | not null default nextval('table_id_seq'::regclass)
url_id      | integer | not null
domain_id   | integer | not null
position    | integer | not null

The problem with the schema above is that I don't have the exact answer on how to partition it. Data for all periods is going to be used (NO queries will have date filters).

I thought about partitioning on "domain_id" field, but the problem is that it is hard to predict how many rows each partition will have.

My main question is:

Does is make sense to partition data if i don't use partition pruning and i am not going to delete old data?

What will be pros/cons of that ?

How will degrade my import speed, if i won't do partitioning?

Another question related to normalization:

Should url be exported to another table?

Pros of normalization

Table is going to have rows with average size of 20-30 bytes.
Joins on "url_id" are supposed to be much faster than on "url" field

Pros of denormalization

Data can be imported much, much faster, as i don't have to make lookup into "url" table before each insert.

Can anybody give me any advice? Thanks!

466

asked May 03 '12 13:05

1 Answers

Partitioning is most useful if you are going to either have selection criteria in most queries which allow the planner to skip access to most of the partitions most of the time, or if you want to periodically purge all rows that are assigned to a partition, or both. (Dropping a table is a very fast way to delete a large number of rows!) I have heard of people hitting a threshold where partitioning helped keep indexes shallower, and therefore boost performance; but really that gets back to the first point, because you effectively move the first level of the index tree to another place -- it still has to happen.

On the face of it, it doesn't sound like partitioning will help.

Normalization, on the other hand, may improve performance more than you expect; by keeping all those rows narrower, you can get more of them into each page, reducing overall disk access. I would do proper 3rd normal form normalization, and only deviate from that based on evidence that it would help. If you see a performance problem while you still have disk space for a second copy of the data, try creating a denormalized table and seeing how performance is compared to the normalized version.

175

answered Sep 29 '22 03:09

kgrittn

Related questions
                            
                                Running PostgreSQL docker image on a different Port
                            
                                Alter check constraint on Postgres domain
                            
                                Laravel data mismatch error while using \PDO::ATTR_EMULATE_PREPARES => true
                            
                                Is it possible to connect an existing database with Strapi CMS?
                            
                                JDBC- postgres, connection refused
                            
                                Select clause containing non-EF method calls
                            
                                Execute sql query with Elixir
                            
                                Postgresql vs. MySQL: how do their data sizes compare to each other?
                            
                                How to generate uuid with PostgreSQL 8.4.4 on Ubuntu 10.04?
                            
                                Merge two SELECT queries into one
                            
                                preventing postgresql commands from keep asking for user password
                            
                                Declare Variable Not In Function by postgres
                            
                                Postgres searching in a column splitted by regexp_split_to_table
                            
                                Calculate a column value during INSERT
                            
                                Why does the following Postgres SQL query take so long?
                            
                                How do I find the oldest record in a group using Postgresql?
                            
                                Converting UTC time field from a POSTGRE/SQL database in SQL
                            
                                Rails - Boolean field not saved on PostgreSQL / Heroku
                            
                                PostgreSQL combinations without repetitions
                            
                                Distinct, count and sort in one sql query

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Billions rows in PostgreSql: partition or not to partition?

Tags:

postgresql

partitioning

Oleg Golovanov

People also ask

1 Answers

kgrittn

Recent Activity

Donate For Us