Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon Redshift at 100% disk usage due to VACUUM query

Reading the Amazon Redshift documentatoin I ran a VACUUM on a certain 400GB table which has never been vacuumed before, in attempt to improve query performance. Unfortunately, the VACUUM has caused the table to grow to 1.7TB (!!) and has brought the Redshift's disk usage to 100%. I then tried to stop the VACUUM by running a CANCEL query in the super user queue (you enter it by running "set query_group='superuser';") but although the query didn't raise an error, this had no effect on the vaccum query which keeps running.

What can I do?

like image 748
Maxim Kogan Avatar asked Jul 16 '14 12:07

Maxim Kogan


People also ask

Why is Amazon Redshift's disk usage at 100%?

Reading the Amazon Redshift documentatoin I ran a VACUUM on a certain 400GB table which has never been vacuumed before, in attempt to improve query performance. Unfortunately, the VACUUM has caused the table to grow to 1.7TB (!!) and has brought the Redshift's disk usage to 100%.

Can a vacuum increase redshift's disk usage?

Bookmark this question. Show activity on this post. Reading the Amazon Redshift documentatoin I ran a VACUUM on a certain 400GB table which has never been vacuumed before, in attempt to improve query performance. Unfortunately, the VACUUM has caused the table to grow to 1.7TB (!!) and has brought the Redshift's disk usage to 100%.

How to improve the performance of Amazon Redshift?

This Redshift Vacuum and Analyze query will return all the tables whose stats are off by 10%. You can run the Amazon Redshift Vacuum and Analyze command using the following syntax: Another way to improve the performance of Redshift is by re-structuring the data from OLTP to OLAP.

What is Amazon Redshift delete only vacuum?

Amazon Redshift automatically performs a DELETE ONLY vacuum in the background, so you rarely, if ever, need to run a DELETE ONLY vacuum. A VACUUM DELETE reclaims disk space occupied by rows that were marked for deletion by previous UPDATE and DELETE operations, and compacts the table to free up the consumed space.


2 Answers

Hint: Run this query: (taken from here) to see what tables you should vacuum.

Note: This will help only in the case where you want to know which tables are big, and what you can gain by vacuuming each one.

select trim(pgdb.datname) as Database,
    trim(a.name) as Table,  ((b.mbytes/part.total::decimal)*100)::decimal(5,2) as pct_of_total, b.mbytes, b.unsorted_mbytes
    from stv_tbl_perm a
    join pg_database as pgdb on pgdb.oid = a.db_id
    join (select tbl, sum(decode(unsorted, 1, 1, 0)) as unsorted_mbytes, count(*) as mbytes
    from stv_blocklist group by tbl) b on a.id=b.tbl
    join ( select sum(capacity) as  total
      from stv_partitions where part_begin=0 ) as part on 1=1
    where a.slice=0
    order by 3 desc, db_id, name;

Then vacuum table(s) with high unsorted_mbytes: VACUUM your_table;

like image 184
Benjamin Crouzier Avatar answered Sep 27 '22 17:09

Benjamin Crouzier


I have stopped vacuum operation several times. Maybe the feature was not available that time.
Run the below query, which gives you the process id for vacuum query.

select * from stv_recents where status='Running';

Once you have process id you can run the following query to terminate the process.

select pg_terminate_backend( pid );

like image 21
Rahul Gupta Avatar answered Sep 27 '22 17:09

Rahul Gupta