Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where can I find usage statistics in Redshift?

Before all thank you for your help!

I want to find out which tables in the database are most heavily used, i.e. the amount of users that query the table, the amount of times it was queried, the resources that where consumed by users per table, the total time the tables where queried, and any other useful data. For now I would limit the analysis to 9 specific tables. I'd tried using stl_scan and pg_user using the next two querys:

SELECT
    s.perm_table_name           AS table_name,
    count(*)                    AS qty_query,
    count(DISTINCT s.userid)    AS qty_users
FROM stl_scan s
JOIN pg_user b
    ON s.userid = b.usesysid
JOIN temp_mone_tables tmt
    ON tmt.table_id = s.tbl AND tmt.table = s.perm_table_name
WHERE s.userid > 1
GROUP BY 1
ORDER BY 1;

SELECT
    b.usename                                       AS user_name,
    count(*)                                        AS qty_scans,
    count(DISTINCT s.tbl)                           AS qty_tables,
    count(DISTINCT trunc(starttime))                AS qty_days
FROM stl_scan s
JOIN pg_user b
    ON s.userid = b.usesysid
JOIN temp_mone_tables tmt
    ON tmt.table_id = s.tbl AND tmt.table = s.perm_table_name
WHERE s.userid > 1
GROUP BY 1
ORDER BY 1;

The temp_mone_tables is a temporal table that contains the id and name of the tables I'm interested.

With this queries I'm able to get some information but I need more details. Surprisingly there's not much data online about this kind of statistics.

Again thank you all beforehand!

like image 804
Nambu14 Avatar asked May 04 '18 14:05

Nambu14


People also ask

How to collect statistics in redshift?

Redshift collects statistics in various ways. Statistics are automatically collected for certain database operations. Collect statistics for entire table or subset of columns using Redshift ANALYZE commands. You can generate statistics on entire database or single table.

What is the use of redshift analyze command?

Redshift Analyze command is used to collect the statistics on the tables that query planner uses to create optimal query execution plan using Redshift Explain command. Analyze command obtain sample records from the tables, calculate and store the statistics in STL_ANALYZE table.

How does Amazon Redshift analyze new tables that I create?

Amazon Redshift also analyzes new tables that you create with the following commands: Amazon Redshift returns a warning message when you run a query against a new table that was not analyzed after its data was initially loaded. No warning occurs when you query a table after a subsequent update or load.

How can I improve the performance of my redshift query?

Keeping statistics current improves query performance by enabling the query planner to choose optimal plans. Amazon Redshift refreshes statistics automatically in the background, and you can also explicitly run the ANALYZE command.


2 Answers

Nice work! You are on the right track using the stl_scan table. I'm not clear what further details you're looking for.

For detailed metrics on resource usage you may want to use the SVL_QUERY_METRICS_SUMMARY view. Note that this data is summarized by query not table because a query is the primary way resources are utilized.

Generally, have a look at the admin queries (and views) in our Redshift Utils library on GitHub, particularly v_get_tbl_scan_frequency.sql

like image 94
Joe Harris Avatar answered Oct 16 '22 20:10

Joe Harris


Thanks to Joe Harris' answer I was able to add a lot of information to my previous query. With svl_query_metrics_summary joined to stl_scan you get important data about resources consumption, this information can be extended joining them to the vast number of views listed in Joe's answer.

For me the solution begins with the next query:

SELECT *
FROM stl_scan ss
JOIN pg_user pu
    ON ss.userid = pu.usesysid
JOIN svl_query_metrics_summary sqms
    ON ss.query = sqms.query
JOIN temp_mone_tables tmt
    ON tmt.table_id = ss.tbl AND tmt.table = ss.perm_table_name

The query gives you a lot of data that can be summarized in multiple ways as wanted.

Remember that temp_mone_tables is a temp table that contains the tableid and name of the tables I'm interested.

like image 30
Nambu14 Avatar answered Oct 16 '22 22:10

Nambu14