Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does it mean when stats_off column in table svv_table_info has value 99%?

For one of the table stats_off column in table svv_table_info has value 99% . What does it mean? And how to fix it?

I tried getting the history of anaylse and vaccum for this table. Does Analyse and Vacuum play any role for this column value?

like image 363
Siddharth Dutta Avatar asked Mar 09 '23 19:03

Siddharth Dutta


1 Answers

The VACUUM command will review the table and rearrange the data on disk as appropriate, which will affect the unsorted and empty columns. Closer to 0 the better.

The ANALYZE command will review the table and recompute the statistics asappropriate, which will affect the stats_off column. Closer to 0 the better.

It is possible that even after running the ANALYZE command, it doesn't change much. To maximize the lowest value possible, you should run the VACUUM command first. The statistics for a table includes old records that have been deleted - in Redshift, they are just skipped, but they still take a toll on the overall query performance. So by running VACUUM on the tables first, you'll be giving the ANALYZE command the best view of the usable data.

Just because a table's statistics are stale doesn't mean it necessary causes an issue. What you need to look for are alarms from the query plan generator to see if it's complaining about statistics on the table. You'll see these complaints raise usually when you perform table joins. This query will look to see if any of those complaints were registered in the last day and provides a list of commands to run if it's needed...

SELECT DISTINCT 'ANALYZE ' + feedback_tbl.schema_name + '.' + feedback_tbl.table_name + ';' AS command
FROM ((SELECT
         TRIM(n.nspname) schema_name,
         c.relname       table_name
       FROM (SELECT
               TRIM(SPLIT_PART(SPLIT_PART(a.plannode, ':', 2), ' ', 2)) AS Table_Name,
               COUNT(a.query),
               DENSE_RANK()
               OVER (
                 ORDER BY COUNT(a.query) DESC)                          AS qry_rnk
             FROM stl_explain a,
               stl_query b
             WHERE a.query = b.query
                   AND CAST(b.starttime AS DATE) >= dateadd(DAY, -1, CURRENT_DATE)
                   AND a.userid > 1
                   AND a.plannode LIKE '%%missing statistics%%'
                   AND a.plannode NOT LIKE '%%_bkp_%%'
             GROUP BY Table_Name) miss_tbl
         LEFT JOIN pg_class c ON c.relname = TRIM(miss_tbl.table_name)
         LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
       WHERE miss_tbl.qry_rnk <= 25)
      -- Get the top N rank tables based on the stl_alert_event_log alerts
      UNION
      SELECT
        schema_name,
        table_name
      FROM (SELECT
              TRIM(n.nspname)              schema_name,
              c.relname                    table_name,
              DENSE_RANK()
              OVER (
                ORDER BY COUNT(*) DESC) AS qry_rnk,
              COUNT(*)
            FROM stl_alert_event_log AS l
              JOIN (SELECT
                      query,
                      tbl,
                      perm_table_name
                    FROM stl_scan
                    WHERE perm_table_name <> 'Internal Worktable'
                    GROUP BY query,
                      tbl,
                      perm_table_name) AS s ON s.query = l.query
              JOIN pg_class c ON c.oid = s.tbl
              JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
            WHERE l.userid > 1
                  AND l.event_time >= dateadd(DAY, -1, CURRENT_DATE)
                  AND l.Solution LIKE '%%ANALYZE command%%'
            GROUP BY TRIM(n.nspname),
              c.relname) anlyz_tbl
      WHERE anlyz_tbl.qry_rnk < 25) feedback_tbl
  JOIN svv_table_info info_tbl
    ON info_tbl.schema = feedback_tbl.schema_name
       AND info_tbl.table = feedback_tbl.table_name
WHERE info_tbl.stats_off :: DECIMAL(32, 4) > 10 :: DECIMAL(32, 4)
      AND TRIM(info_tbl.schema) = 'public'
ORDER BY info_tbl.size ASC;

And while we're at it, this query will review the tables for the VACUUM command...

SELECT 'VACUUM FULL ' + "schema" + '.' + "table" + ';' AS command
FROM svv_table_info
WHERE (unsorted > 5 OR empty > 5)
      AND size < 716800;

These queries contain the suggested thresholds as defined by Amazon and is available in their public Python scripts for managing Redshift clusters located here.

like image 189
William Holroyd Avatar answered Apr 28 '23 05:04

William Holroyd