I have a very large postgres database that has one particular schema in it which is dropped in and recreated nightly. After all of the tables in that schema are created I want to vacuum analyze them, however the database is so large that if a do a full db VACUUM ANALYZE;
it takes about a half hour.
How can I go about vacuum analyzing each of the tables in this schema only without writing a separate SQL command for each table?
PostgreSQL ANALYZE command collects statistics about specific table columns, entire table, or entire database. The PostgreSQL query planner then uses that data to generate efficient execution plans for queries. Examples: ANALYZE users; collects statistics for users table.
To list the tables in the current database, you can run the \dt command, in psql : If you want to perform an SQL query instead, run this: SELECT table_name FROM information_schema.
When a vacuum process runs, the space occupied by these dead tuples is marked reusable by other tuples. An “analyze” operation does what its name says – it analyzes the contents of a database's tables and collects statistics about the distribution of values in each column of every table.
VACUUM FULL rewrites the entire contents of the table into a new disk file with no extra space, allowing unused space to be returned to the operating system. This form is much slower and requires an ACCESS EXCLUSIVE lock on each table while it is being processed.
The bash function below utilizes the CLI tool psql
to vacuum analyze tables in a single schema which can be identified by either passing the name of the schema as the first parameter to the function or setting the environment variable PG_SCHEMA
:
vacuum_analyze_schema() {
# vacuum analyze only the tables in the specified schema
# postgres info can be supplied by either passing it as parameters to this
# function, setting environment variables or a combination of the two
local pg_schema="${1:-${PG_SCHEMA}}"
local pg_db="${2:-${PG_DB}}"
local pg_user="${3:-${PG_USER}}"
local pg_host="${4:-${PG_HOST}}"
echo "Vacuuming schema \`${pg_schema}\`:"
# extract schema table names from psql output and put them in a bash array
local psql_tbls="\dt ${pg_schema}.*"
local sed_str="s/${pg_schema}\s+\|\s+(\w+)\s+\|.*/\1/p"
local table_names=$( echo "${psql_tbls}" | psql -d "${pg_db}" -U "${pg_user}" -h "${pg_host}" | sed -nr "${sed_str}" )
local tables_array=( $( echo "${table_names}" | tr '\n' ' ' ) )
# loop through the table names creating and executing a vacuum
# command for each one
for t in "${tables_array[@]}"; do
echo "doing table \`${t}\`..."
psql -d "${pg_db}" -U "${pg_user}" -h "${pg_host}" \
-c "VACUUM (ANALYZE) ${pg_schema}.${t};"
done
}
This function can be added to your .bashrc
to provide the ability to invoke it from the command line at any time. Like the schema, Postgres connection and database values can be set by either supplying them as function parameters:
# params must be in this order
vacuum_analyze_schema '<your-pg-schema>' '<your-pg-db>' '<your-pg-user>' '<your-pg-host>'
or by setting environment variables:
PG_SCHEMA='<your-pg-schema>'
PG_USER='<your-pg-user>'
PG_HOST='<your-pg-host>'
PG_DB='<your-pg-db>'
vacuum_analyze_schema
or by a combination of both. Values passed as params will take precedence over corresponding environment vars.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With