Before all thank you for your help!
I want to find out which tables in the database are most heavily used, i.e. the amount of users that query the table, the amount of times it was queried, the resources that where consumed by users per table, the total time the tables where queried, and any other useful data. For now I would limit the analysis to 9 specific tables. I'd tried using stl_scan and pg_user using the next two querys:
SELECT
s.perm_table_name AS table_name,
count(*) AS qty_query,
count(DISTINCT s.userid) AS qty_users
FROM stl_scan s
JOIN pg_user b
ON s.userid = b.usesysid
JOIN temp_mone_tables tmt
ON tmt.table_id = s.tbl AND tmt.table = s.perm_table_name
WHERE s.userid > 1
GROUP BY 1
ORDER BY 1;
SELECT
b.usename AS user_name,
count(*) AS qty_scans,
count(DISTINCT s.tbl) AS qty_tables,
count(DISTINCT trunc(starttime)) AS qty_days
FROM stl_scan s
JOIN pg_user b
ON s.userid = b.usesysid
JOIN temp_mone_tables tmt
ON tmt.table_id = s.tbl AND tmt.table = s.perm_table_name
WHERE s.userid > 1
GROUP BY 1
ORDER BY 1;
The temp_mone_tables is a temporal table that contains the id and name of the tables I'm interested.
With this queries I'm able to get some information but I need more details. Surprisingly there's not much data online about this kind of statistics.
Again thank you all beforehand!
Redshift collects statistics in various ways. Statistics are automatically collected for certain database operations. Collect statistics for entire table or subset of columns using Redshift ANALYZE commands. You can generate statistics on entire database or single table.
Redshift Analyze command is used to collect the statistics on the tables that query planner uses to create optimal query execution plan using Redshift Explain command. Analyze command obtain sample records from the tables, calculate and store the statistics in STL_ANALYZE table.
Amazon Redshift also analyzes new tables that you create with the following commands: Amazon Redshift returns a warning message when you run a query against a new table that was not analyzed after its data was initially loaded. No warning occurs when you query a table after a subsequent update or load.
Keeping statistics current improves query performance by enabling the query planner to choose optimal plans. Amazon Redshift refreshes statistics automatically in the background, and you can also explicitly run the ANALYZE command.
Nice work! You are on the right track using the stl_scan
table. I'm not clear what further details you're looking for.
For detailed metrics on resource usage you may want to use the SVL_QUERY_METRICS_SUMMARY
view. Note that this data is summarized by query not table because a query is the primary way resources are utilized.
Generally, have a look at the admin queries (and views) in our Redshift Utils library on GitHub, particularly v_get_tbl_scan_frequency.sql
Thanks to Joe Harris' answer I was able to add a lot of information to my previous query. With svl_query_metrics_summary joined to stl_scan you get important data about resources consumption, this information can be extended joining them to the vast number of views listed in Joe's answer.
For me the solution begins with the next query:
SELECT *
FROM stl_scan ss
JOIN pg_user pu
ON ss.userid = pu.usesysid
JOIN svl_query_metrics_summary sqms
ON ss.query = sqms.query
JOIN temp_mone_tables tmt
ON tmt.table_id = ss.tbl AND tmt.table = ss.perm_table_name
The query gives you a lot of data that can be summarized in multiple ways as wanted.
Remember that temp_mone_tables is a temp table that contains the tableid and name of the tables I'm interested.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With