Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query against 250k rows taking 53 seconds

The box this query is running on is a dedicated server running in a datacenter.

AMD Opteron 1354 Quad-Core 2.20GHz 2GB of RAM Windows Server 2008 x64 (Yes I know I only have 2GB of RAM, I'm upgrading to 8GB when the project goes live).

So I went through and created 250,000 dummy rows in a table to really stress test some queries that LINQ to SQL generates and make sure they're not to terrible and I noticed one of them was taking an absurd amount of time.

I had this query down to 17 seconds with indexes but I removed them for the sake of this answer to go from start to finish. Only indexes are Primary Keys.

Stories table --
[ID] [int] IDENTITY(1,1) NOT NULL,
[UserID] [int] NOT NULL,
[CategoryID] [int] NOT NULL,
[VoteCount] [int] NOT NULL,
[CommentCount] [int] NOT NULL,
[Title] [nvarchar](96) NOT NULL,
[Description] [nvarchar](1024) NOT NULL,
[CreatedAt] [datetime] NOT NULL,
[UniqueName] [nvarchar](96) NOT NULL,
[Url] [nvarchar](512) NOT NULL,
[LastActivityAt] [datetime] NOT NULL,

Categories table --
[ID] [int] IDENTITY(1,1) NOT NULL,
[ShortName] [nvarchar](8) NOT NULL,
[Name] [nvarchar](64) NOT NULL,

Users table --
[ID] [int] IDENTITY(1,1) NOT NULL,
[Username] [nvarchar](32) NOT NULL,
[Password] [nvarchar](64) NOT NULL,
[Email] [nvarchar](320) NOT NULL,
[CreatedAt] [datetime] NOT NULL,
[LastActivityAt] [datetime] NOT NULL,

Currently in the database there is 1 user, 1 category and 250,000 stories and I tried to run this query.

SELECT TOP(10) *
FROM Stories
INNER JOIN Categories ON Categories.ID = Stories.CategoryID
INNER JOIN Users ON Users.ID = Stories.UserID
ORDER BY Stories.LastActivityAt

Query takes 52 seconds to run, CPU usage hovers at 2-3%, Membery is 1.1GB, 900MB free but the Disk usage seems out of control. It's @ 100MB/sec with 2/3 of that being writes to tempdb.mdf and the rest is reading from tempdb.mdf.

Now for the interesting part...

SELECT TOP(10) *
FROM Stories
INNER JOIN Categories ON Categories.ID = Stories.CategoryID
INNER JOIN Users ON Users.ID = Stories.UserID

SELECT TOP(10) *
FROM Stories
INNER JOIN Users ON Users.ID = Stories.UserID
ORDER BY Stories.LastActivityAt

SELECT TOP(10) *
FROM Stories
INNER JOIN Categories ON Categories.ID = Stories.CategoryID
ORDER BY Stories.LastActivityAt

All 3 of these queries are pretty much instant.

Exec plan for first query.
http://i43.tinypic.com/xp6gi1.png

Exec plans for other 3 queries (in order).
http://i43.tinypic.com/30124bp.png
http://i44.tinypic.com/13yjml1.png
http://i43.tinypic.com/33ue7fb.png

Any help would be much appreciated.

Exec plan after adding indexes (down to 17 seconds again).
http://i39.tinypic.com/2008ytx.png

I've gotten a lot of helpful feedback from everyone and I thank you, I tried a new angle at this. I query the stories I need, then in separate queries get the Categories and Users and with 3 queries it only took 250ms... I don't understand the issue but if it works and at 250ms no less for the time being I'll stick with that. Here's the code I used to test this.

DBDataContext db = new DBDataContext();
Console.ReadLine();

Stopwatch sw = Stopwatch.StartNew();

var stories = db.Stories.OrderBy(s => s.LastActivityAt).Take(10).ToList();
var storyIDs = stories.Select(c => c.ID);
var categories = db.Categories.Where(c => storyIDs.Contains(c.ID)).ToList();
var users = db.Users.Where(u => storyIDs.Contains(u.ID)).ToList();

sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
like image 442
Chad Moran Avatar asked Mar 04 '09 01:03

Chad Moran


People also ask

Why is my SQL query taking so long?

There are a number of things that may cause a query to take longer time to execute: Inefficient query – Use non-indexed columns while lookup or joining, thus MySQL takes longer time to match the condition. Table lock – The table is locked, by global lock or explicit table lock when the query is trying to access it.

How do I know which query takes a long time?

You can view this by Right Clicking on Instance Name in SQL Server Management Studio and selecting “Activity Monitor”. Activity monitor tells you what the current and recent activities are in your SQL Server Instance. The above screenshot displays an overview window for the Activity Monitor.

How long should a SQL query take?

The query takes 20 to 500 ms (or sometimes more) depending on the system and the amount of data. The performance of the database or the database server has a significant influence on the speed.

Why is my query faster the second time it runs?

When you run your query for the first time and the data is not in cache, the server read the data from disk. It is time-comsuming. The second time you execute the same query data is already in cache so it requires less time.


1 Answers

Try adding an index on Stories.LastActivityAt. I think the clustered index scan in the execution plan may be due to the sorting.

Edit: Since my query returned in an instant with rows just a few bytes long, but has been running for 5 minutes already and is still going after I added a 2K varchar, I think Mitch has a point. It is the volume of that data that is shuffled around for nothing, but this can be fixed in the query.

Try putting the join, sort and top(10) in a view or in a nested query, and then join back against the story table to get the rest of the data just for the 10 rows that you need.

Like this:

select * from 
(
    SELECT TOP(10) id, categoryID, userID
    FROM Stories
    ORDER BY Stories.LastActivityAt
) s
INNER JOIN Stories ON Stories.ID = s.id
INNER JOIN Categories ON Categories.ID = s.CategoryID
INNER JOIN Users ON Users.ID = s.UserID

If you have an index on LastActivityAt, this should run very fast.

like image 170
cdonner Avatar answered Oct 19 '22 18:10

cdonner