How to select rows, and nearby rows

Background

I have a table of values that some need attention:

| ID      | AddedDate   |
|---------|-------------|
|       1 | 2010-04-01  |
|       2 | 2010-04-01  |
|       3 | 2010-04-02  |
|       4 | 2010-04-02  |
|       5 | NULL        | <----------- needs attention
|       6 | 2010-04-02  |
|       7 | 2010-04-03  |
|       8 | 2010-04-04  |
|       9 | 2010-04-04  |
| 2432659 | 2016-06-15  |
| 2432650 | 2016-06-16  |
| 2432651 | 2016-06-17  |
| 2432672 | 2016-06-18  |
| 2432673 | NULL        | <----------- needs attention
| 2432674 | 2016-06-20  |
| 2432685 | 2016-06-21  |

I want to select the rows where AddedDate is null, and i want to select rows around it. In this example question it would be sufficient to say rows where the ID is ±3. This means i want:

| ID      | AddedDate   |
|---------|-------------|
|       2 | 2010-04-01  | ─╮
|       3 | 2010-04-02  |  │
|       4 | 2010-04-02  |  │
|       5 | NULL        |  ├──ID values ±3
|       6 | 2010-04-02  |  │
|       7 | 2010-04-03  |  │
|       8 | 2010-04-04  | ─╯

| 2432672 | 2016-06-18  | ─╮
| 2432673 | NULL        |  ├──ID values ±3
| 2432674 | 2016-06-20  | ─╯

Note: In reality it's a table of 9M rows, and 15k need attention.

Attempts

First i create a query that builds the ranges i'm interested in returning:

SELECT
  ID-3 AS [Low ID],
  ID+3 AS [High ID]
FROM Items
WHERE AddedDate IS NULL

Low ID   High ID
-------  -------
2        8 
2432670  2432676

So my initial attempt to use this does work:

WITH dt AS (
   SELECT ID-3 AS Low, ID+3 AS High
   FROM Items
   WHERE AddedDate IS NULL
)
SELECT * FROM Items
WHERE EXISTS(
    SELECT 1 FROM dt
    WHERE Items.ID BETWEEN dt.Low AND dt.High)

But when i try it on real data:

9 million total rows
15,000 interesting rows
subtree cost of 63,318,400
it takes hours (before i give up and cancel it)

enter image description here

There's probably a more efficient way.

Bonus Reading

Select a row and rows around it
Select Rows with matching columns from SQL Server
How can I search for rows "around" a given string value?
How to get N rows starting from row M from sorted table in T-SQL

875

asked Feb 24 '18 16:02

Ian Boyd

2 Answers

This is your existing logic rewritten using an moving max:

WITH dt AS (
   SELECT
      ID, AddedDate,
      -- check if there's a NULL within a range of +/- 3 rows
      -- and remember it's ID 
      max(case when AddedDate is null then id end)
      over (order by id 
            rows between 3 preceding and 3 following) as NullID
   FROM Items 
)
SELECT *
FROM dt
where id between NullID-3 and NullID+3

105

answered Sep 27 '22 15:09

dnoeth

Here is one method that uses the windowing clause:

select i.*
from (select i.*,
             count(*) over (order by id rows between 3 preceding and 1 preceding) as cnt_prec,
             count(*) over (order by id rows between 1 following and 3 following) as cnt_foll,
             count(addeddate) over (order by id rows between 3 preceding and 1 preceding) as cnt_ad_prec,
             count(addeddate) over (order by id rows between 1 following and 3 following) as cnt_ad_foll
      from items
     ) i
where cnt_ad_prec <> cnt_prec or
      cnt_ad_foll <> cnt_foll or
      addeddate is null;
order by id;

This returns all rows that have NULL in the column or are within three rows of a NULL.

The need for the comparison to the count is to avoid the edge issues on the smallest and largest ids.

answered Sep 27 '22 16:09

Gordon Linoff

Related questions
                            
                                MySQL query to print output as CSV to standard output
                            
                                SQL based storage vs SVN
                            
                                How to synthesize attribute for joined tables
                            
                                Convert LINQ Expression to SQL Text without DB Context
                            
                                Pivoting data and complex annotations in Django ORM
                            
                                SQL selecting people you may know
                            
                                SQL Server - conditional aggregation with correlation
                            
                                Get difference between two times for SQL Server 2012
                            
                                When is sqlite's manifest typing useful?
                            
                                Is there an Oracle equivalent to SQL Server's OUTPUT INSERTED.*?
                            
                                Referencing current row in FILTER clause of window function
                            
                                SQL Server - Partitioned Tables vs. Clustered Index?
                            
                                Oracle SQL Query logging
                            
                                How to insert datetime with timezone to SQLite?
                            
                                Should a foreign key be created on the parent table or child table?
                            
                                MyBatis 3 - get SQL string from mapper
                            
                                ModuleNotFoundError: No module named 'pyodbc' when importing pyodbc into py script
                            
                                Rails migrations for postgreSQL schemas
                            
                                How to return rows from a declare/begin/end block in Oracle?
                            
                                How do I select an aggregate object efficiently using Dapper?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to select rows, and nearby rows

Tags:

sql

sql-server

sql-server-2012