I have a database which is used to store information about different matches for a game that I pull in from an external source. Due to a few issues, there are occasional gaps (which could be anywhere from 1 missing ID to a few hundred) in the database. I want to have the program pull in the data for the missing games, but I need to get that list first.
Here is the format of the table:
id (pk-identity) | GameID (int) | etc. | etc.
I had thought of writing a program to run through a loop and query for each GameID starting at 1, but it seems like there should be a more efficient way to get the missing numbers.
Is there an easy and efficient way, using SQL Server, to find all the missing numbers from the range?
Syntax to check if the value is an integer. select yourColumnName from yourTableName where yourColumnName REGEXP '^-?[0-9]+$'; The query wherein we have used regular expression. This will output only the integer value.
The idea is to look at where the gaps start. Let me assume you are using SQL Server 2012, and so have the lag()
and lead()
functions. The following gets the next id
:
select t.*, lead(id) over (order by id) as nextid
from t;
If there is a gap, then nextid <> id+1
. You can now characterize the gaps using where
:
select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*, lead(id) over (order by id) as nextid
from t
) t
where nextid <> id+1;
EDIT:
Without the lead()
, I would do the same thing with a correlated subquery:
select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*,
(select top 1 id
from t t2
where t2.id > t.id
order by t2.id
) as nextid
from t
) t
where nextid <> id+1;
Assuming the id
is a primary key on the table (or even that it just has an index), both methods should have reasonable performance.
Numbers table!
CREATE TABLE dbo.numbers (
number int NOT NULL
)
ALTER TABLE dbo.numbers
ADD
CONSTRAINT pk_numbers PRIMARY KEY CLUSTERED (number)
WITH FILLFACTOR = 100
GO
INSERT INTO dbo.numbers (number)
SELECT (a.number * 256) + b.number As number
FROM (
SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number <= 255
) As a
CROSS
JOIN (
SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number <= 255
) As b
GO
Then you can perform an OUTER JOIN
or EXISTS` between your two tables and find the gaps...
SELECT *
FROM dbo.numbers
WHERE NOT EXISTS (
SELECT *
FROM your_table
WHERE id = numbers.number
)
-- OR
SELECT *
FROM dbo.numbers
LEFT
JOIN your_table
ON your_table.id = numbers.number
WHERE your_table.id IS NULL
I like the "gaps and islands" approach. It goes a little something like this:
WITH Islands AS (
SELECT GameId, GameID - ROW_NUMBER() OVER (ORDER BY GameID) AS [IslandID]
FROM dbo.yourTable
)
SELECT MIN(GameID), MAX(Game_id)
FROM Islands
GROUP BY IslandID
That query will get you the list of contiguous ranges. From there, you can self-join that result set (on successive IslandIDs) to get the gaps. There is a bit of work in getting the IslandIDs themselves to be contiguous though. So, extending the above query:
WITH
cte1 AS (
SELECT GameId, GameId - ROW_NUMBER() OVER (ORDER BY GameId) AS [rn]
FROM dbo.yourTable
)
, cte2 AS (
SELECT [rn], MIN(GameId) AS [Start], MAX(GameId) AS [End]
FROM cte1
GROUP BY [rn]
)
,Islands AS (
SELECT ROW_NUMBER() OVER (ORDER BY [rn]) AS IslandId, [Start], [End]
from cte2
)
SELECT a.[End] + 1 AS [GapStart], b.[Start] - 1 AS [GapEnd]
FROM Islands AS a
LEFT JOIN Islands AS b
ON a.IslandID + 1 = b.IslandID
SELECT * FROM #tab1
id col1
----------- --------------------
1 a
2 a
3 a
8 a
9 a
10 a
11 a
15 a
16 a
17 a
18 a
WITH cte (id,nextId) as
(SELECT t.id, (SELECT TOP 1 t1.id FROM #tab1 t1 WHERE t1.id > t.id) AS nextId FROM #tab1 t)
SELECT id AS 'GapStart', nextId AS 'GapEnd' FROM cte
WHERE id + 1 <> nextId
GapStart GapEnd
----------- -----------
3 8
11 15
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With