For one of the table stats_off
column in table svv_table_info
has value 99% . What does it mean? And how to fix it?
I tried getting the history of anaylse and vaccum for this table. Does Analyse and Vacuum play any role for this column value?
The VACUUM
command will review the table and rearrange the data on disk as appropriate, which will affect the unsorted
and empty
columns. Closer to 0 the better.
The ANALYZE
command will review the table and recompute the statistics asappropriate, which will affect the stats_off
column. Closer to 0 the better.
It is possible that even after running the ANALYZE command, it doesn't change much. To maximize the lowest value possible, you should run the VACUUM command first. The statistics for a table includes old records that have been deleted - in Redshift, they are just skipped, but they still take a toll on the overall query performance. So by running VACUUM on the tables first, you'll be giving the ANALYZE command the best view of the usable data.
Just because a table's statistics are stale doesn't mean it necessary causes an issue. What you need to look for are alarms from the query plan generator to see if it's complaining about statistics on the table. You'll see these complaints raise usually when you perform table joins. This query will look to see if any of those complaints were registered in the last day and provides a list of commands to run if it's needed...
SELECT DISTINCT 'ANALYZE ' + feedback_tbl.schema_name + '.' + feedback_tbl.table_name + ';' AS command
FROM ((SELECT
TRIM(n.nspname) schema_name,
c.relname table_name
FROM (SELECT
TRIM(SPLIT_PART(SPLIT_PART(a.plannode, ':', 2), ' ', 2)) AS Table_Name,
COUNT(a.query),
DENSE_RANK()
OVER (
ORDER BY COUNT(a.query) DESC) AS qry_rnk
FROM stl_explain a,
stl_query b
WHERE a.query = b.query
AND CAST(b.starttime AS DATE) >= dateadd(DAY, -1, CURRENT_DATE)
AND a.userid > 1
AND a.plannode LIKE '%%missing statistics%%'
AND a.plannode NOT LIKE '%%_bkp_%%'
GROUP BY Table_Name) miss_tbl
LEFT JOIN pg_class c ON c.relname = TRIM(miss_tbl.table_name)
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE miss_tbl.qry_rnk <= 25)
-- Get the top N rank tables based on the stl_alert_event_log alerts
UNION
SELECT
schema_name,
table_name
FROM (SELECT
TRIM(n.nspname) schema_name,
c.relname table_name,
DENSE_RANK()
OVER (
ORDER BY COUNT(*) DESC) AS qry_rnk,
COUNT(*)
FROM stl_alert_event_log AS l
JOIN (SELECT
query,
tbl,
perm_table_name
FROM stl_scan
WHERE perm_table_name <> 'Internal Worktable'
GROUP BY query,
tbl,
perm_table_name) AS s ON s.query = l.query
JOIN pg_class c ON c.oid = s.tbl
JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE l.userid > 1
AND l.event_time >= dateadd(DAY, -1, CURRENT_DATE)
AND l.Solution LIKE '%%ANALYZE command%%'
GROUP BY TRIM(n.nspname),
c.relname) anlyz_tbl
WHERE anlyz_tbl.qry_rnk < 25) feedback_tbl
JOIN svv_table_info info_tbl
ON info_tbl.schema = feedback_tbl.schema_name
AND info_tbl.table = feedback_tbl.table_name
WHERE info_tbl.stats_off :: DECIMAL(32, 4) > 10 :: DECIMAL(32, 4)
AND TRIM(info_tbl.schema) = 'public'
ORDER BY info_tbl.size ASC;
And while we're at it, this query will review the tables for the VACUUM command...
SELECT 'VACUUM FULL ' + "schema" + '.' + "table" + ';' AS command
FROM svv_table_info
WHERE (unsorted > 5 OR empty > 5)
AND size < 716800;
These queries contain the suggested thresholds as defined by Amazon and is available in their public Python scripts for managing Redshift clusters located here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With