Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgresQL Automating VACUUM FULL for bloated tables

We have a product using PostgreSQL database server that is deployed at a couple of hundred clients. Some of them have gathered tens of gigabytes of data over the years. So in the next version we will introduce automated cleanup procedures which will gradually archive and DELETE old records during nightly batch jobs.

If I understand correctly, autovacuum will kick in and analyze and reorganize the tuples, so the performance will be like it was when fewer records existed.

The actual disk space will not be released, if I understand correctly, since that only happens with a VACUUM FULL, and that is not triggered by autovacuum.

So I was thinking about an automated process that would do this.

I found the bloat view that is used by nagios check_postgres at http://wiki.postgresql.org/wiki/Show_database_bloat.

Is this view any good? Am I correct in understanding that if the tbloat is > 2, it could use a VACUUM FULL? And if the ibloat is too high, it could use a REINDEX?

Any comments on the following job to run as daily batch job?

  • vacuumdb -Z mydatabase #vacuum with analyze-only
  • select tablename from bloatview order by tbloat desc limit 1
  • vacuumdb -f -t tablename mydatabase
  • select tablename, iname from bloatview order by ibloat desc limit 1
  • reindexdb -t tablename -i iname mydatabase

Off course, I still need to wrap it in a nice perl script in crontab (we're using ubuntu 12), or does postgresql have some sort of scheduler I could do this with?

Or is this total overkill and is there a much simpler procedure?

like image 990
GeertPt Avatar asked Dec 18 '12 11:12

GeertPt


2 Answers

You probably don't need it. It is good to do this once — after first archiving job so you'll get your disk space back but after that your daily archiving job and autovacuum will prevent dead tuples bloat.

Also instead of vacuum full it is often better to run cluster table_name using index_name; analyze table_name. This will reorder rows according to an index. This way related table rows can be saved physically close on disk, which can limit disk seeking (important on classic disk drives, largely irrelevant on SSD) and a number of reads for your typical queries.

And remember that both vacuum full and cluster will make your tables unusable while they run.

like image 91
Tometzky Avatar answered Oct 09 '22 14:10

Tometzky


OK, I worked my way through it.

I simplified/reworked the view to split it up in the following two:

CREATE OR REPLACE VIEW
    bloat_datawidth AS
SELECT
    ns.nspname AS schemaname,
    tbl.oid   AS relid,
    tbl.relname,
    CASE
        WHEN every(avg_width IS NOT NULL)
        THEN SUM((1-null_frac)*avg_width) + MAX(null_frac) * 24
        ELSE NULL
    END AS datawidth
FROM
    pg_attribute att
JOIN
    pg_class tbl
ON
    att.attrelid = tbl.oid
JOIN
    pg_namespace ns
ON
    ns.oid = tbl.relnamespace
LEFT JOIN
    pg_stats s
ON
    s.schemaname=ns.nspname
AND s.tablename = tbl.relname
AND s.inherited=false
AND s.attname=att.attname
WHERE
    att.attnum > 0
AND tbl.relkind='r'
GROUP BY
    1,2,3;

And

CREATE OR REPLACE VIEW
    bloat_tables AS
SELECT
    bdw.schemaname,
    bdw.relname,
    bdw.datawidth,
    cc.reltuples::bigint,
    cc.relpages::bigint,
    ceil(cc.reltuples*bdw.datawidth/current_setting('block_size')::NUMERIC)::bigint AS expectedpages,
    100 - (cc.reltuples*100*bdw.datawidth)/(current_setting('block_size')::NUMERIC*cc.relpages) AS bloatpct
FROM
    bloat_datawidth bdw
JOIN
    pg_class cc
ON
    cc.oid = bdw.relid
AND cc.relpages > 1
AND bdw.datawidth IS NOT NULL;

And the cron job:

#!/bin/bash

MIN_BLOAT=65
MIN_WASTED_PAGES=100
LOG_FILE=/var/log/postgresql/bloat.log
DATABASE=unity-stationmaster
SCHEMA=public

if [[ "$(id -un)" != "postgres" ]]
then
echo "You need to be user postgres to run this script."
exit 1
fi

TABLENAME=`psql $DATABASE -t -A -c "select relname from bloat_tables where bloatpct > $MIN_BLOAT and relpages-expectedpages > $MIN_WASTED_PAGES and schemaname ='$SCHEMA' order by wastedpages desc limit 1"`

if [[ -z "$TABLENAME" ]]
then
echo "No bloated tables." >> $LOG_FILE
exit 0
fi

vacuumdb -v -f -t $TABLENAME $DATABASE >> $LOG_FILE
like image 3
GeertPt Avatar answered Oct 09 '22 13:10

GeertPt