Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do the results of a SQL query explain depend on the size of the database?

My application is using JPA with Hibernate and I see that hibernate generates some interesting SQL queries with a lot of joins in my log files. The application does not have a lot of users right now and I am worried that some of the queries being generated by hibernate are going to cause problems when the database grows in size.

I have run some of the sql queries generated by hibernate through the EXPLAIN command to look at the query plans that are generated.

  1. Is the output of EXPLAIN dependent on the size of the database? When my database grows in size will the query planner generate different plans for the same SQL queries?

  2. At what point in the development / deployment cycle should I be looking at SQL query plans for sql queries generated by hibernate? When is the right time to use EXPLAIN.

  3. How can the output of explain be used to determine if a query will become a problem, when the database is so small that every query no matter how complex looking runs in under 0.5 seconds?

I am using Postgres 9.1 as the database for my application but I am interested in the general answer to the above questions.

like image 834
ams Avatar asked Sep 07 '12 07:09

ams


People also ask

Does SQL Server database size affect performance?

The amount of data stored in a database has a great impact on its performance. It is usually accepted that a query becomes slower with additional data in the database.

How do I find the size of a SQL query result?

You can include the actual execution plan of the query in the Results window of SSMS, which will display an estimated row size for the results. Multiply that by the number of rows to get your result.

What affects SQL query performance?

Table size: If your query hits one or more tables with millions of rows or more, it could affect performance. Joins: If your query joins two tables in a way that substantially increases the row count of the result set, your query is likely to be slow.

Does space matter in SQL?

Whitespace is optional in pretty much any language where it is not absolutely necessary to preserve boundaries between keywords and/or identifiers. You could write code in C# that looked similar to your SQL, and as long as the compiler can still parse the identifiers and keywords, it doesn't care.


2 Answers

Actually, @ams you are right in your comment - it is generally pointless to use explain with tiny amounts of data.

If a table only has 10 rows then it's quite likely all in one page and it costs (roughly) the same to read one row as all 10. Going to an index first and then fetching the page will be more expensive than just reading the lot and ignoring what you don't want. PostgreSQL's planner has configured costs for things like index reads, table reads, disk accesses vs cache accesses, sorting etc. It sizes these according to the (approximate) size of the tables and distribution of values within them. What it doesn't do (as of the pending 9.2 release) is account for cross-column or cross-table correlations. It also doesn't offer manual hints that let you override the planner's choices (unlike MS-SQL or Oracle).

Each RDBMS' planner has different strengths and weaknesses but I think it's fair to say that MySQL's is the weakest (particularly in older releases).

So - if you want to know how your system will perform with 100 concurrent users and billions of rows you'll want to generate test data and load for a sizeable fraction of that. Worse, you'll want to have roughly the same distribution of values too. If most clients have about 10 invoices but a few have 1000 then that's something your test data will need to reflect. If you need to maintain performance across multiple RDBMS then repeat testing across all of them.

This is all separate from the overall performance of the system of course, which depends on the size and capabilities of your server vs its required load. A system can cope with a steady increase in load and then suddenly you will see performance drop rapidly as cache sizes are exceeded etc.

HTH

like image 147
Richard Huxton Avatar answered Oct 09 '22 16:10

Richard Huxton


1 Is the output of EXPLAIN dependent on the size of the database? When my database grows in size will the query planner generate different plans for the same SQL queries?

It all depends on your data and the statistics about the data. Many performance problems occur because lack of statistics, when somebody forgot to ANALYZE or turned auto_vacuum (incl. analyze) off.

2 At what point in the development / deployment cycle should I be looking at SQL query plans for sql queries generated by hibernate? When is the right time to use EXPLAIN.

Hibernate has a habit of sending lots and lots of queries to the database, even for simple joins. Turn your querylog on, and keep an eye on that one. Later on, you could run an auto-explain on all queries from your log.

3 How can the output of explain be used to determine if a query will become a problem, when the database is so small that every query no matter how complex looking runs in under 0.5 seconds?

No, because it all depends on the data. When 95% of your users are male, an index on gender won't be used when searching for a man. When you're looking for a woman, the index makes sense and will be used. A functional index on records where gender = female, is even better: It's useless to index something that will never benefit from an index and the index will be much smaller.

The only thing you can do to predict the usage of indexes, is testing with set enable_seqscan = off; that will show that it is possible to use some index, but that's all.

like image 39
Frank Heikens Avatar answered Oct 09 '22 18:10

Frank Heikens