Statistics on Query Time (PostgreSQL)

Tags:

I have a table with a billion rows and I would like to determine the average time and standard deviation of time for several queries of the form:

select * from mytable where col1 = '36e2ae77-43fa-4efa-aece-cd7b8b669043';
select * from mytable where col1 = '4b58c002-bea4-42c9-8f31-06a499cabc51';
select * from mytable where col1 = 'b97242ae-9f6c-4f36-ad12-baee9afae194';

....

I have a thousand random values for col1 stored in another table.

Is there some way to store how long each of these queries took (in milliseconds) in a separate table, so that I can run some statistics on them? Something like: for each col1 in my random table, execute the query, record the time, then store it in another table.

A completely different approach would be fine, as long as I can stay within PostgreSQL (i.e., I don't want to write an external program to do this).

312

asked Jul 01 '10 18:07

Donald Miner

2 Answers

You need to change your PostgreSQL configuration file.

Do enable this property:

log_min_duration_statement = -1        # -1 is disabled, 0 logs all statements                                    
                                       # and their durations, > 0 logs only                                       
                                       # statements running at least this number                                  
                                       # of milliseconds

After that, execution time will be logged and you will be able to figure out exactly how bad (or good) are performing your queries.

You can also use some LOG PARSING utilities to provide awesome HTML output for further analysis such as pgfouine.

answered Sep 22 '22 09:09

Pablo Santa Cruz

Are you aware of the EXPLAIN statement?

This command displays the execution plan that the PostgreSQL planner generates for the supplied statement. The execution plan shows how the table(s) referenced by the statement will be scanned — by plain sequential scan, index scan, etc. — and if multiple tables are referenced, what join algorithms will be used to bring together the required rows from each input table.

The most critical part of the display is the estimated statement execution cost, which is the planner's guess at how long it will take to run the statement (measured in units of disk page fetches). Actually two numbers are shown: the start-up time before the first row can be returned, and the total time to return all the rows. For most queries the total time is what matters, but in contexts such as a subquery in EXISTS, the planner will choose the smallest start-up time instead of the smallest total time (since the executor will stop after getting one row, anyway). Also, if you limit the number of rows to return with a LIMIT clause, the planner makes an appropriate interpolation between the endpoint costs to estimate which plan is really the cheapest.

The ANALYZE option causes the statement to be actually executed, not only planned. The total elapsed time expended within each plan node (in milliseconds) and total number of rows it actually returned are added to the display. This is useful for seeing whether the planner's estimates are close to reality.

Could pretty easily write a script which does an EXPLAIN ANALYZE on your query for each of the random values in a table, and save the output to a file / table / etc.

answered Sep 25 '22 09:09

matt b

Related questions
                            
                                SUM For Distinct Rows
                            
                                List all working dates between two dates in SQL
                            
                                SQL: Creating a relation table with 2 different auto_increment
                            
                                SQL Server Computed Column as Primary Key
                            
                                How can I get the last 12 months from the current date PLUS extra days till 1st of the last month retrieved
                            
                                Django model - Foreign Key as Primary Key
                            
                                How can we delete a column from sql table by using code first approach in EF 6.0?
                            
                                postgresql change all sequences with for loop
                            
                                Can MySQL use Indexes when there is OR between conditions?
                            
                                Sequelize.js insert a model with one-to-many relationship
                            
                                Using GROUP BY and OVER
                            
                                Polymorphism in databases
                            
                                How can I pass a null value for date in vb.net to sql stored procedure?
                            
                                How do you remember/manage your SQL examples?
                            
                                SQL Server Index Which should be clustered?
                            
                                How do you strip leading spaces in Oracle?
                            
                                How do I limit the acceptable values in a database column to be 1 to 5?
                            
                                Batch commit on large INSERT operation in native SQL?
                            
                                Query to calculate average time between successive events
                            
                                Removing duplicate SQL records to permit a unique key

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Statistics on Query Time (PostgreSQL)

Tags:

performance

sql

postgresql

Donald Miner

People also ask

2 Answers

Pablo Santa Cruz

matt b

Recent Activity

Donate For Us