Do the results of a SQL query explain depend on the size of the database?

Tags:

My application is using JPA with Hibernate and I see that hibernate generates some interesting SQL queries with a lot of joins in my log files. The application does not have a lot of users right now and I am worried that some of the queries being generated by hibernate are going to cause problems when the database grows in size.

I have run some of the sql queries generated by hibernate through the EXPLAIN command to look at the query plans that are generated.

Is the output of EXPLAIN dependent on the size of the database? When my database grows in size will the query planner generate different plans for the same SQL queries?
At what point in the development / deployment cycle should I be looking at SQL query plans for sql queries generated by hibernate? When is the right time to use EXPLAIN.
How can the output of explain be used to determine if a query will become a problem, when the database is so small that every query no matter how complex looking runs in under 0.5 seconds?

I am using Postgres 9.1 as the database for my application but I am interested in the general answer to the above questions.

834

asked Sep 07 '12 07:09

ams

2 Answers

Actually, @ams you are right in your comment - it is generally pointless to use explain with tiny amounts of data.

If a table only has 10 rows then it's quite likely all in one page and it costs (roughly) the same to read one row as all 10. Going to an index first and then fetching the page will be more expensive than just reading the lot and ignoring what you don't want. PostgreSQL's planner has configured costs for things like index reads, table reads, disk accesses vs cache accesses, sorting etc. It sizes these according to the (approximate) size of the tables and distribution of values within them. What it doesn't do (as of the pending 9.2 release) is account for cross-column or cross-table correlations. It also doesn't offer manual hints that let you override the planner's choices (unlike MS-SQL or Oracle).

Each RDBMS' planner has different strengths and weaknesses but I think it's fair to say that MySQL's is the weakest (particularly in older releases).

So - if you want to know how your system will perform with 100 concurrent users and billions of rows you'll want to generate test data and load for a sizeable fraction of that. Worse, you'll want to have roughly the same distribution of values too. If most clients have about 10 invoices but a few have 1000 then that's something your test data will need to reflect. If you need to maintain performance across multiple RDBMS then repeat testing across all of them.

This is all separate from the overall performance of the system of course, which depends on the size and capabilities of your server vs its required load. A system can cope with a steady increase in load and then suddenly you will see performance drop rapidly as cache sizes are exceeded etc.

HTH

147

answered Oct 09 '22 16:10

Richard Huxton

1 Is the output of EXPLAIN dependent on the size of the database? When my database grows in size will the query planner generate different plans for the same SQL queries?

It all depends on your data and the statistics about the data. Many performance problems occur because lack of statistics, when somebody forgot to ANALYZE or turned auto_vacuum (incl. analyze) off.

2 At what point in the development / deployment cycle should I be looking at SQL query plans for sql queries generated by hibernate? When is the right time to use EXPLAIN.

Hibernate has a habit of sending lots and lots of queries to the database, even for simple joins. Turn your querylog on, and keep an eye on that one. Later on, you could run an auto-explain on all queries from your log.

3 How can the output of explain be used to determine if a query will become a problem, when the database is so small that every query no matter how complex looking runs in under 0.5 seconds?

No, because it all depends on the data. When 95% of your users are male, an index on gender won't be used when searching for a man. When you're looking for a woman, the index makes sense and will be used. A functional index on records where gender = female, is even better: It's useless to index something that will never benefit from an index and the index will be much smaller.

The only thing you can do to predict the usage of indexes, is testing with set enable_seqscan = off; that will show that it is possible to use some index, but that's all.

answered Oct 09 '22 18:10

Frank Heikens

Related questions
                            
                                Set Default Database connection Mysql Workbench 5.2
                            
                                Duplicate users in mysql.user
                            
                                ON DUPLICATE KEY UPDATE with WHERE condition
                            
                                How to log mySQL queries without indexes?
                            
                                IMDB to MySQL: Insert IMDB data into MySQL database
                            
                                What is the recommended approach for encrypting/decrypting large text data in MySQL?
                            
                                Oracle equivalent of MySQL update ignore
                            
                                MySQL multiple row literal select [duplicate]
                            
                                Return the data from the rows with the most recent date of each distinct candidate_id
                            
                                one database per client or all clients in one database. which one should I use for a online application?
                            
                                MySQL GROUP BY UNIX TIMESTAMP
                            
                                Dynamically creating date periods using MySQL
                            
                                Using mysql with rails - how do I set this up?
                            
                                Equivalent query in sql server for INFORMATION_SCHEMA
                            
                                PHP update MYSQL many-to-many relation
                            
                                PHP - To Insert data from checkbox forms into MYSQL
                            
                                Multiple databases or many many tables?
                            
                                getting data from a many to many db relationship
                            
                                I am getting Cannot get a connection, pool error Timeout waiting for idle object, When I try to create more than 250 threads in my web application [duplicate]
                            
                                Proper way to return a ResultSet

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Do the results of a SQL query explain depend on the size of the database?

Tags:

sql

sql-server

mysql

postgresql

oracle

ams

People also ask

2 Answers

Richard Huxton

Frank Heikens

Recent Activity

Donate For Us