I was asked the following: I have a table (lets call it tbl) and it has one column of type int (let call it num) and it had sequential numbers in it:
num
---
1
2
4
5
6
8
9
11
Now you need to write a query that returns the first missing number (in this example that answer would be 3).
here's my answer (works):
select top 1 (num + 1)
from tbl
where (num + 1) not in (select num from tbl)
after writing this, I was asked, what if tbl contained 10 million records - how would you improve performence (because obviously myinner query would cause a full table scan).
My thoughts were about an index in on the num field and doing a not exists. but I would love to hear some alternatives.
In SQL Server 2012+, I would simply use lead()
:
select num
from (select num, lead(num) over (order by num) as next_num
from tbl
) t
where next_num <> num + 1;
However, I suspect that your version would have the best performance if you have an index on tbl(num)
. The not exists
version is worth testing:
select top 1 (num + 1)
from tbl t
where not exists (select 1 from tbl t2 where t2.num = t.num + 1);
The only issue is getting the first number. You are not guaranteed that the table is read "in order". So, this will return one number. With an index (or better yet, clustered index) on num
, the following should be fast and guaranteed to return the first number:
select top 1 (num + 1)
from tbl t
where not exists (select 1 from tbl t2 where t2.num = t.num + 1)
order by num;
Here is another approach using ROW_NUMBER
:
SQL Fiddle
;WITH CteRN AS(
SELECT *,
RN = num - ROW_NUMBER() OVER(ORDER BY num)
FROM tbl
)
SELECT TOP 1 num - RN
FROM CteRN
WHERE RN > 0
ORDER BY num ASC
With a proper INDEX
on num
, here is the stats under a million row test harness.
Original - NOT IN : CPU time = 296 ms, elapsed time = 289 ms
wewesthemenace : CPU time = 0 ms, elapsed time = 0 ms
notulysses(NOT EXISTS): CPU time = 687 ms, elapsed time = 679 ms.
All the above answers are accurate alternatively if we can make an table like date dimension that has all sequential number starting from 1 to n.
insert into sequenceNumber
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11 union all
select 12 union all
select 13
select * from sequenceNumber
RESULT:
1 2 3 4 5 6 7 8 9 10 11 12 13
Now we can use below query to find out missing number.
select a.num,b.num
from sequenceNumber a
left outer join tbl b on a.num = b.num
where b.num is null
Hope it will be useful.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With