Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vacuum analyze all tables in a schema postgres

Tags:

postgresql

I have a very large postgres database that has one particular schema in it which is dropped in and recreated nightly. After all of the tables in that schema are created I want to vacuum analyze them, however the database is so large that if a do a full db VACUUM ANALYZE; it takes about a half hour.

How can I go about vacuum analyzing each of the tables in this schema only without writing a separate SQL command for each table?

like image 277
Grant Humphries Avatar asked Apr 17 '15 22:04

Grant Humphries


People also ask

How do I analyze all tables in PostgreSQL?

PostgreSQL ANALYZE command collects statistics about specific table columns, entire table, or entire database. The PostgreSQL query planner then uses that data to generate efficient execution plans for queries. Examples: ANALYZE users; collects statistics for users table.

How do I see all the tables in a schema in PostgreSQL?

To list the tables in the current database, you can run the \dt command, in psql : If you want to perform an SQL query instead, run this: SELECT table_name FROM information_schema.

What is the difference between vacuum and analyze in PostgreSQL?

When a vacuum process runs, the space occupied by these dead tuples is marked reusable by other tuples. An “analyze” operation does what its name says – it analyzes the contents of a database's tables and collects statistics about the distribution of values in each column of every table.

What does vacuum do in PostgreSQL?

VACUUM FULL rewrites the entire contents of the table into a new disk file with no extra space, allowing unused space to be returned to the operating system. This form is much slower and requires an ACCESS EXCLUSIVE lock on each table while it is being processed.


1 Answers

The bash function below utilizes the CLI tool psql to vacuum analyze tables in a single schema which can be identified by either passing the name of the schema as the first parameter to the function or setting the environment variable PG_SCHEMA:

vacuum_analyze_schema() {
    # vacuum analyze only the tables in the specified schema

    # postgres info can be supplied by either passing it as parameters to this
    # function, setting environment variables or a combination of the two
    local pg_schema="${1:-${PG_SCHEMA}}"
    local pg_db="${2:-${PG_DB}}"
    local pg_user="${3:-${PG_USER}}"
    local pg_host="${4:-${PG_HOST}}"

    echo "Vacuuming schema \`${pg_schema}\`:"

    # extract schema table names from psql output and put them in a bash array
    local psql_tbls="\dt ${pg_schema}.*"
    local sed_str="s/${pg_schema}\s+\|\s+(\w+)\s+\|.*/\1/p"
    local table_names=$( echo "${psql_tbls}" | psql -d "${pg_db}" -U "${pg_user}" -h "${pg_host}"  | sed -nr "${sed_str}" )
    local tables_array=( $( echo "${table_names}" | tr '\n' ' ' ) )

    # loop through the table names creating and executing a vacuum
    # command for each one
    for t in "${tables_array[@]}"; do
        echo "doing table \`${t}\`..."
        psql -d "${pg_db}" -U "${pg_user}" -h "${pg_host}" \
            -c "VACUUM (ANALYZE) ${pg_schema}.${t};"
    done
}

This function can be added to your .bashrc to provide the ability to invoke it from the command line at any time. Like the schema, Postgres connection and database values can be set by either supplying them as function parameters:

# params must be in this order
vacuum_analyze_schema '<your-pg-schema>' '<your-pg-db>' '<your-pg-user>' '<your-pg-host>'

or by setting environment variables:

PG_SCHEMA='<your-pg-schema>'
PG_USER='<your-pg-user>'
PG_HOST='<your-pg-host>'
PG_DB='<your-pg-db>'

vacuum_analyze_schema

or by a combination of both. Values passed as params will take precedence over corresponding environment vars.

like image 98
Grant Humphries Avatar answered Sep 22 '22 07:09

Grant Humphries