Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server query - not performing as expected, not behaving like I thought it would

I have an advanced SQL question for your SQL perf gurus out there :-)

I'm currently trying to understand some behavior in a larger application, but it boils down to a query against these two tables:

  • Users table - roughly 750 entries, UserId (varchar(50)) as clustered PK
  • ActionLog table - millions of entries, includes UserId - but no FK relationship

For a grid in my ASP.NET application, I'm trying to get all users plus their last log entry's date.

The SQL statement that is currently being used looks something like this:

SELECT
     UserId, (other columns),
     LastLogDate = (SELECT TOP (1) [Timestamp] FROM dbo.ActionLog a WHERE a.UserId = u.UserId ORDER BY [Timestamp] DESC) 
FROM
     dbo.Users u;

and it returns the rows to display - but it's fairly slow (approx. 20 seconds).

My first thought was to add an index on the ActionLog table on UserId and to include the Timestamp column in it:

CREATE NONCLUSTERED INDEX [IDX_UserId]
ON [dbo].[ActionLog]([UserId] ASC)
INCLUDE ([Timestamp])

The rows are now returned very quickly - under 2 seconds, with 350'000 entries in the ActionLog table, and my index is being used just fine, as the execution plan shows me. All seems fine.

Now, for approximating the production scenario, we loaded roughly 2 million rows into the ActionLog table, 95% or more of which refer to a non-existing user (i.e. these rows have a UserId that doesn't exist in the Users table).

Now suddenly, the query becomes extremely slow (24 minutes!), and the index isn't being used anymore.

I assumed that since the vast majority of the entries in the ActionLog table don't line up with an existing user, I would see performance gains if I use a filtered index - to "weed out" all those messy entries without a corresponding user - so I created this index (replacing the other one that existed before):

CREATE NONCLUSTERED INDEX [IDX_UserId]
ON [dbo].[Log]([UserId] ASC)
INCLUDE ([Timestamp])
WHERE UserId <> 'user'    -- that's the fixed, non-existing "UserId" I wanted to avoid

But to my dismay - the query is still about the same - takes over 20 minutes to complete. I updated the statistics - no change - still extremely slow.

Funny thing (to me) is: when I dropped the index and re-created it -> now the query was really fast again (again less than 3 seconds). WOW!

But as soon as I start adding more entries again, the query "tilts" and becomes really really slow.......

I don't fully understand why this is happening - I was thinking that with a filtered index that eliminates all those "rogue" entries, I would be seeing good performance on trying to find the newest ActionLog entry for existing users - but that doesn't seem to be the case.

WHY NOT?

Any ideas? Thoughts? Things to try??

like image 475
marc_s Avatar asked Nov 11 '14 20:11

marc_s


People also ask

Why is MySQL Server query suddenly slow?

WAITING: Queries can be slow because they're waiting on a bottleneck for a long time. See a detailed list of bottlenecks in types of Waits. RUNNING: Queries can be slow because they're running (executing) for a long time. In other words, these queries are actively using CPU resources.


2 Answers

First, INCLUDE here is not the best choice. You sort by entry date, but included columns aren't sorted. Better solution would be:

CREATE NONCLUSTERED INDEX [IX_ActionLog_UserIdTimestamp] ON [dbo].[ActionLog]
([UserId], [Timestamp]);

Second, looks like you might need to update statistics on your index more often than an automatic update would. I have seen cases when, in a situation akin to yours, I had to update stats every 10 minutes, due to excessive inserts. That was back on 2005, though.

like image 87
Roger Wolf Avatar answered Sep 28 '22 20:09

Roger Wolf


Try this query and see how it performs with your original index or with the modified suggested by @Roger Wolf:

SELECT u.UserId, a.LastLogDate 
FROM dbo.Users u
INNER JOIN (
    SELECT UserId, Max([TimeStamp]) AS LastLogDate
    FROM dbo.ActionLog 
    WHERE userid <> 'user' -- the user to filter out
    GROUP BY UserId
) a ON a.UserId = u.UserId

If it sucks I'll delete the answer :)

like image 43
jpw Avatar answered Sep 28 '22 19:09

jpw