Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Write Optimal SQL Queries

Tags:

sql

I've searched around stackoverflow but everybody asks to optimize queries they've already done.

I want to know, the basic stuff on what to do, what to avoid when creating a query.

For example, It's a known fact that Writing SELECT * FROM is a thing to avoid, given that The sql engine has to make an "invisible" query to know what columns should be shown.

Also know that between @min_number AND @max_number works better than Id >= @min_number AND Id <= @max_number but I don't recall why. It can be because between is a sentence controlled at a lower level by the engine, and creates the iterations to show the regs somehow "handled". But I just don't know for sure.

Could someone validate those and make a list of the most common what to do, what to avoid ?

like image 690
apacay Avatar asked May 02 '11 13:05

apacay


People also ask

What is query optimization in SQL Server with example?

SQL Query optimization is defined as the iterative process of enhancing the performance of a query in terms of execution time, the number of disk accesses, and many more cost measuring criteria. Data is an integral part of any application.


4 Answers

My list is SQL Server specific (I'm sure that are lots more):

Use sargable where clauses - that means no functions especially scalar UDFs in where clauses among other things

WHERE NOT EXISTS tends to be the faster choice than a left join with a where id is null structure when you are looking for those rows which don't match a second table.

Correlated subqueries tend to run row by row and are horribly slow.

Views that call other views can't be indexed and become very slow especially if you get several levels in on large tables.

Select * is to be avoided espcially when you have a join as at least one column is sent twice which is wasteful of server and database and network resources.

Cursors can usually be replaced with much faster performing set-based logic When you store data in the correct way, you can avoid alot of on-the-fly transformations.

When updating, make sure you add a where clause so that you don't update rows where the new value and the old value are the same. This could be the differnce between updating 10,000,000 rows and updating 15. Sample (Tsql Update structure, if you use another db, you may have to lookup the correct syntax, but it should give you the idea.):

Update t
set field1 = t2.field2
from table1 t
join table2 t2 on t.tid = t2.tid
Where t.field1 <> t2.field2

Or

Update t
set field1 = @variable
from table1 t
Where t.field1 <> @variable

Check your indexing. SQL Seerver does not automatically index foreign keys. If they are used in a join, they generally need to be indexed.

If you are constantly using functions on a field, you are probably not storing it correctly (or you should have a persisted calculated field and do the transformation only once not every time you select the column.)

You best bet is to get a good performance tuning book for your database of choice (what wokrs best is very database specific) and read the chapters concerning writing queries.

like image 82
HLGEM Avatar answered Oct 08 '22 18:10

HLGEM


  • Views are macros, not magic
  • EXISTs and NOT EXISTs work best usually
  • Functions on columns (see Joel C's answers)
  • Beware implicit conversion (eg smallint column compared to int parameter)
  • Understand covering indexes
  • Denormalise after you see issues
  • Understand aggregates: stop thinking of loops
  • ...

Edit, Feb 2012:

Avoid these "Ten Common SQL Programming Mistakes"

like image 35
gbn Avatar answered Oct 08 '22 20:10

gbn


In your WHERE clause, avoid using a column as an input to a function, as this can cause a full table scan instead of being able to use an index. The query optimizer on some platforms does a better job than others, but it's generally better to be safe. For instance, if you're looking for records from the past 30 days, do the data manipulation against the date you're comparing against, not against your column:

BAD

WHERE DATEADD(DAY, 30, [RecordDate]) > GETDATE()

This may cause a full table scan (depending on the query optimizer for your platform), even if [RecordDate] is indexed, because DATEADD(DAY, 30, [RecordDate]) has to be evaluated to compare it against GETDATE(). If you change it to:

BETTER

WHERE [RecordDate] > DATEADD(DAY, -30, GETDATE())

This will now always be able to use an index on [RecordDate] regardless of how good the query plan optimizer is on your platform, because DATEADD(DAY, -30, GETDATE()) gets evaluated once and can then be used as a lookup in the index. The same principle applies to using a CASE statement, UDF's, etc.

like image 5
Joel C Avatar answered Oct 08 '22 19:10

Joel C


A few general points about optimizing queries:

  • Know your data. Know your data. Know your data. I would venture to guess that half of all database performance problems stem from an incomplete understanding of the data and the requirements of the query. Know if your query will be usually returning 50 rows or 5 million rows. Know if you need to get back 3 columns or 50 columns. Know what columns are key columns on the tables, and filter on these.

  • Understand your database structure. If you're working with a database in third-normal form, recognize that this structure typically works best on queries for lots of small, transactional statements operating on individual rows. If you are working in a star or snowflake design, recognize that it's optimized for large queries and aggregations.

like image 4
N West Avatar answered Oct 08 '22 20:10

N West