Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this query faster without index?

I inherited a new system and I am trying to make some improvements on the data. I am trying to improve this table and can't seem to make sense of my findings.

I have the following table structure:

CREATE TABLE [dbo].[Calls](
    [CallID] [varchar](8) NOT NULL PRIMARY KEY,
    [RecvdDate] [varchar](10) NOT NULL,
    [yr] [int] NOT NULL,
    [Mnth] [int] NOT NULL,
    [CallStatus] [varchar](50) NOT NULL,
    [Category] [varchar](100) NOT NULL,
    [QCall] [varchar](15) NOT NULL,
    [KOUNT] [int] NOT NULL)

This table has about 220k records in it. I need to return all records that have a date greater than specific date. In this case 12/1/2009. This query will return about 66k records and it takes about 4 seconds to run. From past systems I have worked on this seems high. Especially given how few records are in the table. So I would like to bring that time down.

So I'm wondering what would be some good ways to bring that down? I tried adding a date column to the table and converting the string date to an actual date column. Then I added an index on that date column but the time stayed the same. Given that there aren't that many records I can see how a table scan could be fast but I would think that an index could bring that time down.

I have also considered just querying off the month and year columns. But I haven't tried it yet. And would like to keep it off the date column if possible. But if not I can change it.

Any help is appreciated.

EDIT: Here is the query I am trying to run and test the speed of the table. I usually put out the columns but just for simplicity I used * :

SELECT *
FROM _FirstSlaLevel_Tickets_New
WHERE TicketRecvdDateTime >= '12/01/2009'

EDIT 2: So I mentioned that I had tried to create a table with a date column that contained the recvddate data but as a date rather than a varchar. That is what TicketRecvdDateTime column is in the query above. The original query I am running against this table is:

SELECT *
FROM Calls
WHERE CAST(RecvdDate AS DATE) >= '12/01/2009'
like image 829
spinon Avatar asked Nov 29 '10 18:11

spinon


3 Answers

You may be encountering what is referred to as the Tipping Point in SQL Server. Even though you have the appropriate index on the column, SQL Server may decided to do a table scan anyway if the expected number of rows returned exceeds some threshold (the 'tipping point').

In your example, this seems likely since your is turning 1/4 of the number of rows in the database. The following is a good article that explains this: http://www.sqlskills.com/BLOGS/KIMBERLY/category/The-Tipping-Point.aspx

like image 65
Randy Minder Avatar answered Sep 23 '22 08:09

Randy Minder


SELECT * will usually give a poor performance.

Either the index will be ignored or you'll end up with a key/bookmark lookup into the clustered index. No matter: both can run badly.

For example, if you had this query, and the index on TicketRecvdDateTime INCLUDEd CallStatus, then it would most likely run as expected. This would be covering

SELECT CallStatus
FROM _FirstSlaLevel_Tickets_New
WHERE TicketRecvdDateTime >= '12/01/2009'

This is in addition to Randy Minder's answer: a key/bookmark lookup may be cheap enough for a handful of rows but not for a large chunk of the table data.

like image 28
gbn Avatar answered Sep 22 '22 08:09

gbn


Your query is faster w/o an index (or, more precisly, is the same speed w/ or w/o the indeX) because and index on RecvdDate will always be ignored in an expression like CAST(RecvdDate AS DATE) >= '12/01/2009'. This is a non-SARG-able expression, as it requires the column to be transformed trough a function. In order for this index event to be considered, you have to express your filter criteria exactly on the column being indexed, not on an expression based on it. This would be the first step.

There are more steps:

  • Get rid of the VARCHAR(10) column for dates and replace it with the appropriate DATE or DATETIME column. Storing date and/or time as strings is riddled with problems. Not only for indexing, but also for correctness.
  • A table that is frequently scanned on a range based on a column (as most such call log tables are) should be clustered by that column.
  • It is highly unlikely you really need the yr and mnth columns. If you really do need them, then you probably need them as computed columns.

.

CREATE TABLE [dbo].[Calls](
    [CallID] [varchar](8) NOT NULL,
    [RecvdDate] [datetime](10) NOT NULL,
    [CallStatus] [varchar](50) NOT NULL,
    [Category] [varchar](100) NOT NULL,
    [QCall] [varchar](15) NOT NULL,
    [KOUNT] [int] NOT NULL,
    CONSTRAINT [PK_Calls_CallId] PRIMARY KEY NONCLUSTERED ([CallID]));

CREATE CLUSTERED INDEX cdxCalls ON Calls(RecvDate);

SELECT *
FROM Calls
WHERE RecvDate >= '12/01/2009';

Of course, the proper structure of the table and indexes should be the result of careful analysis, considering all factors involved, including update performance, other queries etc. I recommend you start by going through all the topics included in Designing Indexes.

like image 26
Remus Rusanu Avatar answered Sep 21 '22 08:09

Remus Rusanu