Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select rows, and nearby rows

SQL Fiddle

Background

I have a table of values that some need attention:

| ID      | AddedDate   |
|---------|-------------|
|       1 | 2010-04-01  |
|       2 | 2010-04-01  |
|       3 | 2010-04-02  |
|       4 | 2010-04-02  |
|       5 | NULL        | <----------- needs attention
|       6 | 2010-04-02  |
|       7 | 2010-04-03  |
|       8 | 2010-04-04  |
|       9 | 2010-04-04  |
| 2432659 | 2016-06-15  |
| 2432650 | 2016-06-16  |
| 2432651 | 2016-06-17  |
| 2432672 | 2016-06-18  |
| 2432673 | NULL        | <----------- needs attention
| 2432674 | 2016-06-20  |
| 2432685 | 2016-06-21  |

I want to select the rows where AddedDate is null, and i want to select rows around it. In this example question it would be sufficient to say rows where the ID is ±3. This means i want:

| ID      | AddedDate   |
|---------|-------------|
|       2 | 2010-04-01  | ─╮
|       3 | 2010-04-02  |  │
|       4 | 2010-04-02  |  │
|       5 | NULL        |  ├──ID values ±3
|       6 | 2010-04-02  |  │
|       7 | 2010-04-03  |  │
|       8 | 2010-04-04  | ─╯

| 2432672 | 2016-06-18  | ─╮
| 2432673 | NULL        |  ├──ID values ±3
| 2432674 | 2016-06-20  | ─╯

Note: In reality it's a table of 9M rows, and 15k need attention.

Attempts

First i create a query that builds the ranges i'm interested in returning:

SELECT
  ID-3 AS [Low ID],
  ID+3 AS [High ID]
FROM Items
WHERE AddedDate IS NULL

Low ID   High ID
-------  -------
2        8 
2432670  2432676

So my initial attempt to use this does work:

WITH dt AS (
   SELECT ID-3 AS Low, ID+3 AS High
   FROM Items
   WHERE AddedDate IS NULL
)
SELECT * FROM Items
WHERE EXISTS(
    SELECT 1 FROM dt
    WHERE Items.ID BETWEEN dt.Low AND dt.High)

But when i try it on real data:

  • 9 million total rows
  • 15,000 interesting rows
  • subtree cost of 63,318,400
  • it takes hours (before i give up and cancel it)

enter image description here

There's probably a more efficient way.

Bonus Reading

  • Select a row and rows around it
  • Select Rows with matching columns from SQL Server
  • How can I search for rows "around" a given string value?
  • How to get N rows starting from row M from sorted table in T-SQL
like image 875
Ian Boyd Avatar asked Feb 24 '18 16:02

Ian Boyd


People also ask

How do I SELECT rows by rows in SQL?

To select rows using selection symbols for character or graphic data, use the LIKE keyword in a WHERE clause, and the underscore and percent sign as selection symbols. You can create multiple row conditions, and use the AND, OR, or IN keywords to connect the conditions.

How do I SELECT multiple rows in one row in SQL?

You can concatenate rows into single string using COALESCE method. This COALESCE method can be used in SQL Server version 2008 and higher. All you have to do is, declare a varchar variable and inside the coalesce, concat the variable with comma and the column, then assign the COALESCE to the variable.

How to select all the rows below selected row in Excel?

Now assume after selecting row number 4, we need to select all the rows which are below the selected row, then we can press another shortcut key “Shift + Ctrl + Down Arrow.” So the moment you press this, it will select all the rows which are below the selected row.

How do you select a row in a row header?

Select the row header of the first row in your selected range. Press down the SHIFT key on your keyboard (if you’re on a Mac, then press down on the CMD key). While the SHIFT key is pressed, select the last row of the range that you want to select.

How do I select multiple rows in a table?

Select the row header of the first row that you want to select. Press down the CTRL key of your keyboard. While the CTRL key is pressed, select row headers of subsequent rows that you want to select one by one.

How do I get every other row in Excel?

To get the every 3rd (nth) row, we change the number to divide by to 3 (n). We can switch the filter on to filter on the MOD result required to show specific rows. To get the value from every other row or nth row, we can use the OFFSET and ROW functions. We will walkthrough this below.


2 Answers

This is your existing logic rewritten using an moving max:

WITH dt AS (
   SELECT
      ID, AddedDate,
      -- check if there's a NULL within a range of +/- 3 rows
      -- and remember it's ID 
      max(case when AddedDate is null then id end)
      over (order by id 
            rows between 3 preceding and 3 following) as NullID
   FROM Items 
)
SELECT *
FROM dt
where id between NullID-3 and NullID+3
like image 105
dnoeth Avatar answered Sep 27 '22 15:09

dnoeth


Here is one method that uses the windowing clause:

select i.*
from (select i.*,
             count(*) over (order by id rows between 3 preceding and 1 preceding) as cnt_prec,
             count(*) over (order by id rows between 1 following and 3 following) as cnt_foll,
             count(addeddate) over (order by id rows between 3 preceding and 1 preceding) as cnt_ad_prec,
             count(addeddate) over (order by id rows between 1 following and 3 following) as cnt_ad_foll
      from items
     ) i
where cnt_ad_prec <> cnt_prec or
      cnt_ad_foll <> cnt_foll or
      addeddate is null;
order by id;

This returns all rows that have NULL in the column or are within three rows of a NULL.

The need for the comparison to the count is to avoid the edge issues on the smallest and largest ids.

like image 20
Gordon Linoff Avatar answered Sep 27 '22 16:09

Gordon Linoff