Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Archive old data in Postgresql

I'm currently expecting for somebody to advice me on the process which I'm gonna take forward for DB archiving.

I've database (DB-1) which has 2 very large tables, one table having 25 GB of data and another is 20 GB of data. Which cause major performance issues even I have indexes.

So, we considered to archive the old data with the below process,

  1. Clone a new database (DB-2) from existing database (DB-1).
  2. Delete the old data from DB-1, so it will have only the last 2 years records. In case If I need old data can connect DB-2.
  3. Every month should move an old data from DB-1 to DB-2, and delete the moved rows from DB-1.
like image 867
King_Fisher Avatar asked Jul 11 '19 09:07

King_Fisher


People also ask

How do I archive data in PostgreSQL?

Export the detached table data to the designated folder in the S3 bucket. In this step, you run the psql command to archive the data to the S3 bucket using two PostgreSQL functions: create_s3_uri defines the target S3 bucket information.

What is archiving in PostgreSQL?

In PostgreSQL terms, copying out generated WAL files is called archiving, and getting the server to read in a WAL file and apply it is called restoring.

What is continuous archiving in PostgreSQL?

PostgreSQL first records any transaction on the database to the WAL log files before it writes the changes to the database cluster's data files. With continuous archiving, the WAL files are copied to secondary storage, which has a couple of benefits.


2 Answers

That is the wrong approach.

What you are looking for is partitioning.

You can create range partitions covering one year each. To remove old data all you need to do is to drop the partition for the year(s) no longer needed.

If you need to keep the data for some reasons, you can also just detach the partition from the table. Then the data is still "lying around", but would not show up in the (partitioned) table. You could query the (detached) partition directly to access that data. You could even move that (detached) partition to a slower harddisk to free up space on your fast disks if you have more than one.

But you might even see that partitioning alone might already improves performance, but that depends a lot on your queries.

Note that you should use Postgres 11 for that, as partitioning wasn't that sophisticated in older versions.

like image 127
a_horse_with_no_name Avatar answered Oct 19 '22 04:10

a_horse_with_no_name


While you should no doubt upgrade your current version (I'd suggest moving away from the EDB system you are working on now, and going to community based Postgres 11) even if you can't upgrade, partitioning is still a much better answer than creating a second database.

By recreating your table as a set of partitions within the same database, you will be able to add/remove data in a much cleaner fashion, and it will make dealing with Vacuums much easier. Even in 9.5, you can take advantage of table inheritance to build out partitions by first adding partitions for incoming data, and then creating partitions at various intervals (probably monthly, since you want to run monthly cleanup) and moving the data into those partitions. This can be accomplished atomically with a series of INSERT INTO partition SELECT * FROM table WHERE <timestamp> style statements.

I suspect you can probably manage this yourself (you need basic sql and the ability to write simple triggers/functions... here is a link to the 9.5 docs), but if you need help, you can engage with one of the Postgres chat communities, or contact a support company if you want a deeper dive.

like image 42
xzilla Avatar answered Oct 19 '22 06:10

xzilla