Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can this SQL Query be optimized to run faster?

I have an SQL Query (For SQL Server 2008 R2) that takes a very long time to complete. I was wondering if there was a better way of doing it?

SELECT @count = COUNT(Name)
FROM Table1 t
WHERE t.Name = @name AND t.Code NOT IN (SELECT Code FROM ExcludedCodes)

Table1 has around 90Million rows in it and is indexed by Name and Code. ExcludedCodes only has around 30 rows in it.

This query is in a stored procedure and gets called around 40k times, the total time it takes the procedure to finish is 27 minutes.. I believe this is my biggest bottleneck because of the massive amount of rows it queries against and the number of times it does it.

So if you know of a good way to optimize this it would be greatly appreciated! If it cannot be optimized then I guess im stuck with 27 min...

EDIT

I changed the NOT IN to NOT EXISTS and it cut the time down to 10:59, so that alone is a massive gain on my part. I am still going to attempt to do the group by statement as suggested below but that will require a complete rewrite of the stored procedure and might take some time... (as I said before, im not the best at SQL but it is starting to grow on me. ^^)

like image 454
Brandon Stout Avatar asked Dec 16 '22 22:12

Brandon Stout


2 Answers

In addition to workarounds to get the query itself to respond faster, have you considered maintaining a column in the table that tells whether it is in this set or not? It requires a lot of maintenance but if the ExcludedCodes table does not change often, it might be better to do that maintenance. For example you could add a BIT column:

ALTER TABLE dbo.Table1 ADD IsExcluded BIT;

Make it NOT NULL and default to 0. Then you could create a filtered index:

CREATE INDEX n ON dbo.Table1(name)
  WHERE IsExcluded = 0;

Now you just have to update the table once:

UPDATE t
  SET IsExcluded = 1
  FROM dbo.Table1 AS t
  INNER JOIN dbo.ExcludedCodes AS x
  ON t.Code = x.Code;

And ongoing you'd have to maintain this with triggers on both tables. With this in place, your query becomes:

SELECT @Count = COUNT(Name)
  FROM dbo.Table1 WHERE IsExcluded = 0;

EDIT

As for "NOT IN being slower than LEFT JOIN" here is a simple test I performed on only a few thousand rows:

enter image description here

EDIT 2

I'm not sure why this query wouldn't do what you're after, and be far more efficient than your 40K loop:

SELECT src.Name, COUNT(src.*)
  FROM dbo.Table1 AS src
  INNER JOIN #temptable AS t
  ON src.Name = t.Name
  WHERE src.Code NOT IN (SELECT Code FROM dbo.ExcludedCodes)
  GROUP BY src.Name;

Or the LEFT JOIN equivalent:

SELECT src.Name, COUNT(src.*)
  FROM dbo.Table1 AS src
  INNER JOIN #temptable AS t
  ON src.Name = t.Name
  LEFT OUTER JOIN dbo.ExcludedCodes AS x
  ON src.Code = x.Code
  WHERE x.Code IS NULL
  GROUP BY src.Name;

I would put money on either of those queries taking less than 27 minutes. I would even suggest that running both queries sequentially will be far faster than your one query that takes 27 minutes.

Finally, you might consider an indexed view. I don't know your table structure and whether your violate any of the restrictions but it is worth investigating IMHO.

like image 104
Aaron Bertrand Avatar answered Dec 18 '22 11:12

Aaron Bertrand


You say this gets called around 40K times. WHy? Is it in a cursor? If so do you really need a cursor. Couldn't you put the values you want for @name in a temp table and index it and then join to it?

select t.name, count(t.name) 
from table t
join #name n on t.name = n.name 
where NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t.code)
group by t.name

That might get you all your results in one query and is almost certainly faster than 40K separate queries. Of course if you need the count of all the names, it's even simpleer

select t.name, count(t.name) 
    from table t
NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t
group by t.name
like image 45
HLGEM Avatar answered Dec 18 '22 12:12

HLGEM