Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do binary search by table with known data order in specific fields SQL

I have a table with structure like this:

create table to_much_data
(
    id primary key clustered,
    dt datetime,
    data varbinary(400)
)

it didn't have an index by datetime, but i know that dt non-decreasing sequence. I need to query data from this table with specific condition by date field like this:

select * 
from to_much_data
where dt between '20190220' and '20190221'

since no index for dt, i prefer to convert query to:

select * 
from to_much_data
where id between StartDateID and EndDateID 

I believe that StartDateID and EndDateID could be found with log(N) or better complexity. But I didn't know any solution to do this.

Does any one know the way to do it?

UPD

It looks like no wide-known ready to use solution exists. If index creation is not possible some workarounds can be used:

  • filtered index, but it can affect table performance and increase locks
  • another table with mapping, but it needs to be manually updated (or through trigger or stored procedure) and it can affect performance and increase locks
  • t-sql code with silly binary search but it looks like bicycle reinventing

despite this I believe that databases can be more effective and intuitive in some cases like this. I will be glad if some day we will be able to write:

select * 
from to_much_data with(sequence_order(id asc, dt asc))
where dt between '20190220' and '20190221'
like image 317
gabba Avatar asked Feb 21 '19 13:02

gabba


People also ask

How do I sort a specific order in SQL?

The ORDER BY statement in SQL is used to sort the fetched data in either ascending or descending according to one or more columns. By default ORDER BY sorts the data in ascending order. We can use the keyword DESC to sort the data in descending order and the keyword ASC to sort in ascending order.

How do I search a specific data in SQL?

Search object in all online SQL databasesOn the home page of the object explorer, enter the object name and search. In the result below, you see that a specified object exists in multiple databases. You can browse to the specified object in the database using the object explorer.

How do I select a specific data from a column in SQL?

The syntax is: SELECT column1, column2 FROM table1, table2 WHERE column2='value'; In the above SQL statement: The SELECT clause specifies one or more columns to be retrieved; to specify multiple columns, use a comma and a space between column names.

How do I search for a specific string in a column in SQL?

SQL Server CHARINDEX() Function The CHARINDEX() function searches for a substring in a string, and returns the position. If the substring is not found, this function returns 0. Note: This function performs a case-insensitive search.


2 Answers

You could just reproduce the binary search algorithm in TSQL or use a recursive CTE but this is still going to need greater than 70 seeks to get both ends and is tedious to do.

A possible middle ground might be to create an indexed view with at least every nth row. For example

CREATE VIEW dbo.to_much_data_Sample
WITH SCHEMABINDING
AS
  SELECT id,
         dt
  FROM   dbo.to_much_data
  WHERE  id % 100000 = 0

GO

CREATE UNIQUE CLUSTERED INDEX ix
  ON dbo.to_much_data_Sample(dt, id);

you can then use (assuming id is an integer)

DECLARE @StartDate DATETIME = '20190220',
        @EndDate   DATETIME = '20190221';

DECLARE @StartDateID INT,
        @EndDateID   INT;

SELECT TOP 1 @StartDateID = id
FROM   dbo.to_much_data_Sample WITH (NOEXPAND)
WHERE  dt < @StartDate
ORDER  BY dt DESC;

SELECT TOP 1 @EndDateID = id
FROM   dbo.to_much_data_Sample WITH (NOEXPAND)
WHERE  dt > @EndDate
ORDER  BY dt ASC;

SELECT *
FROM   to_much_data
WHERE  id BETWEEN isnull(@StartDateID, -2147483648) AND isnull(@EndDateID, 2147483647)
       AND dt BETWEEN @StartDate AND @EndDate; 

enter image description here

The value of n will be a trade off between index size and number of additional rows read at runtime.

like image 60
Martin Smith Avatar answered Oct 24 '22 07:10

Martin Smith


As long as the ID on the Too_Much_Data table is an identity, this could be a solution for you:

CREATE TABLE MaxIdForDate (
        d DATE
    ,   id INT --match datatype of to_much_data's pk
)

CREATE INDEX IX_MaxIdForDate_d_id ON MaxIdForDate(d,id)
GO

--Nightly stored procedure does this
INSERT INTO MaxIdForDate(d,id)
    SELECT
            CONVERT(DATE,tmd.dt) AS d
        ,   MAX(tmd.id) AS id
    FROM to_much_data tmd
    WHERe tmd.id > (
        SELECT MAX(id)
        FROM MaxIdForDate mx
    )
    AND CONVERT(DATE,tmd.dt)<CONVERT(DATE,GETDATE())
    GROUP BY CONVERT(DATE,tmd.dt)
GO

--New Query
    DECLARE @StartDate DATE='02/20/2019'
    DECLARE @EndDate DATE='02/21/2019'

    select tmd.* 
    from to_much_data tmd
    WHERE tmd.id > (SELECT id FROM MaxIdForDate WHERE d=DATEADD(DAY,-1,@StartDate))
    and tmd.id <= (SELECT id FROM MaxIdForDate WHERE d=@EndDate)
like image 24
UnhandledExcepSean Avatar answered Oct 24 '22 08:10

UnhandledExcepSean