Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all integer gaps in SQL

I have a database which is used to store information about different matches for a game that I pull in from an external source. Due to a few issues, there are occasional gaps (which could be anywhere from 1 missing ID to a few hundred) in the database. I want to have the program pull in the data for the missing games, but I need to get that list first.

Here is the format of the table:

id (pk-identity)  |  GameID (int)  |  etc.  |  etc.  

I had thought of writing a program to run through a loop and query for each GameID starting at 1, but it seems like there should be a more efficient way to get the missing numbers.

Is there an easy and efficient way, using SQL Server, to find all the missing numbers from the range?

like image 817
Nicholas Post Avatar asked Sep 09 '13 13:09

Nicholas Post


People also ask

How do you find integers in SQL?

Syntax to check if the value is an integer. select yourColumnName from yourTableName where yourColumnName REGEXP '^-?[0-9]+$'; The query wherein we have used regular expression. This will output only the integer value.


4 Answers

The idea is to look at where the gaps start. Let me assume you are using SQL Server 2012, and so have the lag() and lead() functions. The following gets the next id:

select t.*, lead(id) over (order by id) as nextid
from t;

If there is a gap, then nextid <> id+1. You can now characterize the gaps using where:

select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*, lead(id) over (order by id) as nextid
      from t
     ) t
where nextid <> id+1;

EDIT:

Without the lead(), I would do the same thing with a correlated subquery:

select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*,
             (select top 1 id
              from t t2
              where t2.id > t.id
              order by t2.id
             ) as nextid
      from t
     ) t
where nextid <> id+1;

Assuming the id is a primary key on the table (or even that it just has an index), both methods should have reasonable performance.

like image 94
Gordon Linoff Avatar answered Oct 02 '22 11:10

Gordon Linoff


Numbers table!

CREATE TABLE dbo.numbers (
   number int NOT NULL
)

ALTER TABLE dbo.numbers
ADD
   CONSTRAINT pk_numbers PRIMARY KEY CLUSTERED (number)
     WITH FILLFACTOR = 100
GO

INSERT INTO dbo.numbers (number)
SELECT (a.number * 256) + b.number As number
FROM     (
        SELECT number
        FROM   master..spt_values
        WHERE  type = 'P'
        AND    number <= 255
       ) As a
 CROSS
  JOIN (
        SELECT number
        FROM   master..spt_values
        WHERE  type = 'P'
        AND    number <= 255
       ) As b
GO

Then you can perform an OUTER JOIN or EXISTS` between your two tables and find the gaps...

SELECT *
FROM   dbo.numbers
WHERE  NOT EXISTS (
         SELECT *
         FROM   your_table
         WHERE  id = numbers.number
       )

-- OR

SELECT *
FROM   dbo.numbers
 LEFT
  JOIN your_table
    ON your_table.id = numbers.number
WHERE  your_table.id IS NULL
like image 38
gvee Avatar answered Oct 02 '22 10:10

gvee


I like the "gaps and islands" approach. It goes a little something like this:

WITH Islands AS (
    SELECT GameId, GameID - ROW_NUMBER() OVER (ORDER BY GameID) AS [IslandID]
    FROM dbo.yourTable
)
SELECT MIN(GameID), MAX(Game_id)
FROM Islands
GROUP BY IslandID

That query will get you the list of contiguous ranges. From there, you can self-join that result set (on successive IslandIDs) to get the gaps. There is a bit of work in getting the IslandIDs themselves to be contiguous though. So, extending the above query:

WITH 
cte1 AS (
    SELECT GameId, GameId - ROW_NUMBER() OVER (ORDER BY GameId) AS [rn]
    FROM dbo.yourTable
)
, cte2 AS (
    SELECT [rn], MIN(GameId) AS [Start], MAX(GameId) AS [End]
    FROM cte1
    GROUP BY [rn]
)
,Islands AS (
    SELECT ROW_NUMBER() OVER (ORDER BY [rn]) AS IslandId, [Start], [End]
  from cte2
)

SELECT a.[End] + 1 AS [GapStart], b.[Start] - 1 AS [GapEnd]
FROM Islands AS a
LEFT JOIN Islands AS b
    ON a.IslandID + 1 = b.IslandID
like image 27
Ben Thul Avatar answered Oct 02 '22 11:10

Ben Thul


     SELECT * FROM #tab1
            id          col1
            ----------- --------------------
            1           a
            2           a
            3           a
            8           a
            9           a
            10          a
            11          a
            15          a
            16          a
            17          a
            18          a

 WITH cte (id,nextId) as
                (SELECT t.id, (SELECT TOP 1 t1.id FROM #tab1 t1 WHERE t1.id > t.id) AS nextId  FROM #tab1 t)

 SELECT id AS 'GapStart', nextId AS 'GapEnd' FROM cte
                WHERE id + 1 <> nextId

    GapStart    GapEnd
    ----------- -----------
    3           8
    11          15
like image 45
Vikram Mahapatra Avatar answered Oct 02 '22 11:10

Vikram Mahapatra