Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL - restored database smaller than original

Tags:

postgresql

I have made a backup of my PostgreSQL database using pg_dump to ".sql" file. When I restored the database, its size was 2.8GB compared with 3.7GB of the source (original) database. The application that access the database appears to work fine.

What is the reason to smaller size of the restored database?

like image 265
Nash Avatar asked Mar 24 '15 15:03

Nash


2 Answers

The reason is simple: during its normal operation, when rows are updated, PostgreSQL adds a new copy of the row and marks the old copy of the row as deleted. This is multi-version concurrency control (MVCC) in action. Then VACUUM reclaims the space taken by the old row for data that can be inserted in the future, but doesn't return this space to the operating system as it's in the middle of a file. Note that VACUUM isn't executed immediately, only after enough data has been modified in the table or deleted from the table.

What you're seeing is entirely normal. It just shows that PostgreSQL database will be larger in size than the sum of the sizes of the rows. Your new database will most likely eventaully grow to 3.7GB when you start actively using it.

like image 20
juhist Avatar answered Nov 21 '22 13:11

juhist


The short answer is that database storage is more optimised for speed than space.

For instance, if you inserted 100 rows into a table, then deleted every row with an odd numbered ID, the DBMS could write out a new table with only 50 rows, but it's more efficient for it to simply mark the deleted rows as free space and reuse them when you next insert a row. Therefore the table takes up twice as much space as is currently needed.

Postgres's use of "MVCC", rather than locking, for transaction management makes this even more likely, since an UPDATE usually involves writing a new row to storage, then marking the old row deleted once no transactions are looking at it.

By dumping and restoring the database, you are recreating a DB without all this free space. This is essentially what the VACUUM FULL command does - it rewrites the current data into a new file, then deletes the old file.

There is an extension distributed with Postgres called pg_freespace which lets you examine some of this. e.g. you can list the main table size (not including indexes and columns stored in separate "TOAST" tables) and free space used by each table with the below:

Select oid::regclass::varchar as table,
      pg_size_pretty(pg_relation_size(oid)/1024 * 1024) As size,
      pg_size_pretty(sum(free)) As free
 From (
   Select c.oid,
       (pg_freespace(c.oid)).avail As free
     From pg_class c
     Join pg_namespace n on n.oid = c.relnamespace
    Where c.relkind = 'r'
      And n.nspname Not In ('information_schema', 'pg_catalog')
 ) tbl
 Group By oid
 Order By pg_relation_size(oid) Desc, sum(free) Desc;
like image 114
IMSoP Avatar answered Nov 21 '22 13:11

IMSoP