Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reduce PostgreSQL database size?

I was planning to move from MySQL to PostgreSQL because I wanted to make use of TimescaleDB.

Everything looked good until I did a check on the storage size used by PostgreSQL (v11.2) compared to MySQL (v5.6). For exactly the same number of rows (1,440,000) and content:

  • MySQL: 156 MB
  • PostgreSQL: 246 MB
  • PostgreSQL + TimescaleDB (partitioned/chunked data): 324 MB

The MySQL and PostgreSQL numbers are like for like (i.e. including indexes and other constraints), the PostgreSQL + TimescaleDB has overhead of adding a timestamp to the table. The table concerned looks like this:

create table cell(
    cell_id            serial not null
   ,ts                 timestamp not null
   ,parent_id          int references parent( parent_id )
   ,instance_id        smallint
   ,v                  float
   ,a                  float
   ,t                  float
   ,s1                 float
   ,s2                 float
   ,s3                 float
   ,s4                 float
   ,s5                 float
   ,primary key( cell_id )
);
create index ix_cell_pid on cell( parent_id );
create index ix_cell_inst on cell( instance_id );

Why does PostgreSQL occupy so much more storage than MySQL?
And is there some way of significantly reducing it near to the MySQL level?

like image 689
RayCh Avatar asked Apr 12 '19 15:04

RayCh


3 Answers

Adding a timestamp column should add no more than 11 MB in your case (1440000 * 8 bytes, no added padding).

Have you run VACUUM FULL in Postgres before you measured the size, for a fair comparison? I suspect table and index bloat.

Related:

  • VACUUM returning disk space to operating system
  • Calculating and saving space in PostgreSQL
  • Measure the size of a PostgreSQL table row

In MySQL the data type float is the single-precision floating point type occupying 4 bytes.

In Postgres the same float is the double-precision floating point type, occupying 8 bytes (alias: float8 or double precision).

That should explain another 44 MB of the difference. To compare apples with apples, create the Postgres table with 4-byte real columns (alias float4). Note the difference to MySQL, where real is used for 8-byte floating point numbers! Unfortunate disagreements.

The MySQL manual: https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html
The Postgres manual: https://www.postgresql.org/docs/current/datatype-numeric.html

Related:

  • Comparison of database column types in MySQL, PostgreSQL, and SQLite? (Cross-Mapping)

You show two indexes. Depending on what those are for, one multicolumn index might be able to replace both in Postgres - occupying just as much disk space as one of the two it replaces in this particular case (saving ~ 50 MB with the given specs).

create index ix_cell_pid on cell( parent_id, instance_id );

Consider:

  • Is a composite index also good for queries on the first field?
like image 136
Erwin Brandstetter Avatar answered Oct 10 '22 09:10

Erwin Brandstetter


In addition to Erwin Brandstetter's answer:

There are also the usually hidden system columns to take account of (used to implement MVCC) along with other tuple information such as hint bits (used to cache row visibility).

You can use the pageinspect extension to look at this information, e.g.:

create extension pageinspect;
select * from heap_page_items(get_raw_page('cell', 0));

See here for a description of the system columns.

Your indexes, too, will contain empty space to allow further tuples to be inserted more quickly. Look for "fillfactor" in the CREATE INDEX docs.

like image 30
Steven Winfield Avatar answered Oct 10 '22 08:10

Steven Winfield


  1. connect to the database ( in psql, type \c your-db-name)

  2. checkout how many size each table cost. refer to: https://wiki.postgresql.org/wiki/Disk_Usage

SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
  ORDER BY pg_total_relation_size(C.oid) DESC
  LIMIT 100;

  1. delete or truncte the table.
delete ...
truncte <table_name>  cascade; 
  1. (if disk space not released) close other client which connected to postgres
like image 28
Siwei Avatar answered Oct 10 '22 08:10

Siwei