What i have:
Simplified schema of table, that will be very big:
id| integer | not null default nextval('table_id_seq'::regclass)
url_id | integer | not null
domain_id | integer | not null
position | integer | not null
The problem with the schema above is that I don't have the exact answer on how to partition it. Data for all periods is going to be used (NO queries will have date filters).
I thought about partitioning on "domain_id" field, but the problem is that it is hard to predict how many rows each partition will have.
My main question is:
Does is make sense to partition data if i don't use partition pruning and i am not going to delete old data?
What will be pros/cons of that ?
How will degrade my import speed, if i won't do partitioning?
Another question related to normalization:
Should url be exported to another table?
Pros of normalization
Pros of denormalization
Can anybody give me any advice? Thanks!
As commercial database vendors are bragging about their capabilities we decided to push PostgreSQL to the next level and exceed 1 billion rows per second to show what we can do with Open Source. To those who need even more: 1 billion rows is by far not the limit - a lot more is possible. Watch and see how we did it.
Before PostgreSQL 12, performance could be affected by too many table partitions, and it was recommended to have 100 or less partitions. But as of version 12, partitioning performance has been greatly improved, so that even thousands of partitions can be processed efficiently.
With table partitioning, you can either choose to re-index the table data in its entirety or selectively. Doing so selectively will drastically reduce the time to execute this daily action. For some clients, I have seen between 50 and 75% improvement in performance.
Partitioning is most useful if you are going to either have selection criteria in most queries which allow the planner to skip access to most of the partitions most of the time, or if you want to periodically purge all rows that are assigned to a partition, or both. (Dropping a table is a very fast way to delete a large number of rows!) I have heard of people hitting a threshold where partitioning helped keep indexes shallower, and therefore boost performance; but really that gets back to the first point, because you effectively move the first level of the index tree to another place -- it still has to happen.
On the face of it, it doesn't sound like partitioning will help.
Normalization, on the other hand, may improve performance more than you expect; by keeping all those rows narrower, you can get more of them into each page, reducing overall disk access. I would do proper 3rd normal form normalization, and only deviate from that based on evidence that it would help. If you see a performance problem while you still have disk space for a second copy of the data, try creating a denormalized table and seeing how performance is compared to the normalized version.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With