SQL Server efficiently filter rows where times are not near another table's times

Tags:

I have two tables and I'm looking for the rows in one table where a time column is not near any of the values in another table's time column. (Near is defined as within a minute).

Here's a code sample:

create table temp1
(
    id int identity primary key,
    value datetime not null 
)
GO

create index ix_temp1 on temp1(value, id);
GO

set nocount on
insert temp1 (value) values (DATEADD(second, rand() * 1000000, '20100101'))
GO 15000

table temp2 is set up identical:

create table temp2
(
    id int identity primary key,
    value datetime not null 
)
GO

create index ix_temp2 on temp2(value, id);
GO

set nocount on
insert temp2 (value) values (DATEADD(second, rand() * 1000000, '20100101'))
GO 15000

And here's my first crack at it (which is very inefficient)

SELECT t1.id, t1.value
FROM temp1 t1
LEFT JOIN temp2 t2
    ON t1.value between DATEADD(MINUTE, -1, t2.value) and DATEADD(MINUTE, 1, t2.value)
WHERE t2.value is null

I'm looking for ways to do this more efficiently. All solutions will be considered (new indexes, SSIS solution, CLR solutions, temp tables, cursors etc...)

786

asked Sep 15 '10 15:09

Michael J Swart

3 Answers

The LEFT JOIN/IS NULL isn't as efficient on SQL Server as NOT IN or NOT EXISTS when columns are not nullable - see this link for details.

That said, this:

SELECT t1.id,
       t1.value
  FROM temp1 t1
 WHERE NOT EXISTS(SELECT NULL
                    FROM temp2 t2
                   WHERE t2.value BETWEEN DATEADD(MINUTE, -1, t1.value)  
                                      AND DATEADD(MINUTE, 1, t1.value))

...still has a problem in that function use (IE: DATEADD) renders the index useless. You're altering the data of the column (temporarily, without writing it back to the table) while the index is on the original value.

I'm at a loss for options if you want the precision. Otherwise, if you alter the datetime before it's inserted into the temp table then you gain:

ability to straight compare: t1.value = t2.value
ability to use the index, assuming optimizer believes it can be of use

answered Oct 17 '22 15:10

OMG Ponies

This seems to do it pretty quick:

SELECT t.id,
       t.value
FROM 
(
   SELECT t1.id, 
          t1.value, 
          (SELECT MIN(temp2.value) FROM temp2 WHERE temp2.value >= t1.value) as theNext, 
          (SELECT MAX(temp2.value) FROM temp2 WHERE temp2.value <= t1.value) as thePrev
   FROM temp1 t1
) t 
WHERE DATEDIFF(second, t.value, t.theNext) > 60 
  AND DATEDIFF(second, t.thePrev, t.value) > 60

and it doesn't require any restructure of your tables.

Make sure and use seconds for the comparison, since minutes will get rounded. This runs in less than a second on my machine using your specifications for table creation.

EDIT: Added <= and >= to theNext and thePrev calculations. This prevents a false positive where temp1.value is equal to temp2.value.

answered Oct 17 '22 15:10

Nathan Wheeler

Answer Rewritten

For your original query changing the Join condition from

LEFT JOIN temp2 t2
 ON t1.value BETWEEN DATEADD(MINUTE, -1, t2.value) AND DATEADD(MINUTE, 1, t2.value)

LEFT JOIN temp2 t2
 ON t2.value BETWEEN DATEADD(MINUTE, -1, t1.value) AND DATEADD(MINUTE, 1, t1.value)

Makes a huge difference.

In both it has a scan on temp1 as the outer input to the nested loops iterator. However for the first one the condition on temp2 is not sargable so it needs to do a scan on the whole of temp2 for each row in temp1. For the second version it can do a much more reasonable range seek on the index to retrieve the matching row(s).

However the Not Exists solution as per @OMG's answer is more efficient in SQL Server

Execution Plans:

(Ignore the "Cost Relative to the Batch" for the second one - The estimated rows are way off actual so this figure is misleading)

ExecutionPlans http://img812.imageshack.us/img812/457/executionplans.jpg

answered Oct 17 '22 15:10

Martin Smith

Related questions
                            
                                select row from table and substitute a field with one from another column if it exists
                            
                                Date time exception in coldfusion query in cfc and mySQL
                            
                                Drop_existing throws an error if index does not exist
                            
                                Select items that are the top N results for a related table
                            
                                How can I use SQL to group and count the number of rows where the value for one column is <= x and the value for another column > x?
                            
                                Sql Server - Constraint - Allow to set column A only if column B is null and vice-versa
                            
                                MS SQL Error "Invalid object name" on table I just created
                            
                                How to 'add' a column to a query result while the query contains aggregate function?
                            
                                SQL Server 2008 : replace string
                            
                                Permissions required to allow arbitrary sql to be executed safely
                            
                                MySql - Inserting multiple rows with a joined subquery?
                            
                                How to get difference between 2 columns
                            
                                How to count similar interests in MySQL
                            
                                Sql select query help
                            
                                Execute sql script and not wait for completion
                            
                                Creating indexes for 'OR' operator in queries
                            
                                How to deny delete on a table for all users
                            
                                Packaging SQLite DB with my application
                            
                                What is the preferred way of saving dynamic lists in database?
                            
                                JPA or Hibernates With Oracle Table Partitions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQL Server efficiently filter rows where times are not near another table's times

Tags:

sql

sql-server

tsql

sql-server-2005

Michael J Swart

People also ask

3 Answers

OMG Ponies

Nathan Wheeler

Execution Plans:

Martin Smith

Recent Activity

Donate For Us