I have a table with a billion rows and I would like to determine the average time and standard deviation of time for several queries of the form:
select * from mytable where col1 = '36e2ae77-43fa-4efa-aece-cd7b8b669043';
select * from mytable where col1 = '4b58c002-bea4-42c9-8f31-06a499cabc51';
select * from mytable where col1 = 'b97242ae-9f6c-4f36-ad12-baee9afae194';
....
I have a thousand random values for col1 stored in another table.
Is there some way to store how long each of these queries took (in milliseconds) in a separate table, so that I can run some statistics on them? Something like: for each col1 in my random table, execute the query, record the time, then store it in another table.
A completely different approach would be fine, as long as I can stay within PostgreSQL (i.e., I don't want to write an external program to do this).
The ANALYZE option causes the statement to be actually executed, not only planned. The total elapsed time expended within each plan node (in milliseconds) and total number of rows it actually returned are added to the display. This is useful for seeing whether the planner's estimates are close to reality.
PostgreSQL Usage. Use the ANALYZE command to collect statistics about a database, a table or a specific table column. The PostgreSQL ANALYZE command collects table statistics which support generation of efficient query execution plans by the query planner.
Typically discovered through slow response or extended increases in database CPU, the pg_stat_activity view can help to find out what query is causing issues. The pg_stat_activity view contains details of all currently running queries, including user, connection, and timing details.
Then, we can do a SQL query to get the uptime, like this: postgres=# SELECT current_timestamp - pg_postmaster_start_time(); ...
You need to change your PostgreSQL configuration file.
Do enable this property:
log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements
# and their durations, > 0 logs only
# statements running at least this number
# of milliseconds
After that, execution time will be logged and you will be able to figure out exactly how bad (or good) are performing your queries.
You can also use some LOG PARSING utilities to provide awesome HTML output for further analysis such as pgfouine.
Are you aware of the EXPLAIN
statement?
This command displays the execution plan that the PostgreSQL planner generates for the supplied statement. The execution plan shows how the table(s) referenced by the statement will be scanned — by plain sequential scan, index scan, etc. — and if multiple tables are referenced, what join algorithms will be used to bring together the required rows from each input table.
The most critical part of the display is the estimated statement execution cost, which is the planner's guess at how long it will take to run the statement (measured in units of disk page fetches). Actually two numbers are shown: the start-up time before the first row can be returned, and the total time to return all the rows. For most queries the total time is what matters, but in contexts such as a subquery in EXISTS, the planner will choose the smallest start-up time instead of the smallest total time (since the executor will stop after getting one row, anyway). Also, if you limit the number of rows to return with a LIMIT clause, the planner makes an appropriate interpolation between the endpoint costs to estimate which plan is really the cheapest.
The
ANALYZE
option causes the statement to be actually executed, not only planned. The total elapsed time expended within each plan node (in milliseconds) and total number of rows it actually returned are added to the display. This is useful for seeing whether the planner's estimates are close to reality.
Could pretty easily write a script which does an EXPLAIN ANALYZE
on your query for each of the random values in a table, and save the output to a file / table / etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With