Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Select first missing value in a series

I was asked the following: I have a table (lets call it tbl) and it has one column of type int (let call it num) and it had sequential numbers in it:

num
---
 1
 2
 4
 5
 6
 8
 9
 11

Now you need to write a query that returns the first missing number (in this example that answer would be 3).

here's my answer (works):

select top 1 (num + 1)
from tbl
where (num + 1) not in (select num from tbl)

after writing this, I was asked, what if tbl contained 10 million records - how would you improve performence (because obviously myinner query would cause a full table scan).

My thoughts were about an index in on the num field and doing a not exists. but I would love to hear some alternatives.

like image 986
developer82 Avatar asked Apr 29 '15 03:04

developer82


3 Answers

In SQL Server 2012+, I would simply use lead():

select num
from (select num, lead(num) over (order by num) as next_num
      from tbl
     ) t
where next_num <> num + 1;

However, I suspect that your version would have the best performance if you have an index on tbl(num). The not exists version is worth testing:

select top 1 (num + 1)
from tbl t
where not exists (select 1 from tbl t2 where t2.num = t.num + 1);

The only issue is getting the first number. You are not guaranteed that the table is read "in order". So, this will return one number. With an index (or better yet, clustered index) on num, the following should be fast and guaranteed to return the first number:

select top 1 (num + 1)
from tbl t
where not exists (select 1 from tbl t2 where t2.num = t.num + 1)
order by num;
like image 152
Gordon Linoff Avatar answered Sep 23 '22 01:09

Gordon Linoff


Here is another approach using ROW_NUMBER:

SQL Fiddle

;WITH CteRN AS(
    SELECT *,
        RN = num - ROW_NUMBER() OVER(ORDER BY num)
    FROM tbl
)
SELECT TOP 1 num - RN
FROM CteRN
WHERE RN > 0
ORDER BY num ASC

With a proper INDEX on num, here is the stats under a million row test harness.

Original - NOT IN     : CPU time = 296 ms,  elapsed time = 289 ms
wewesthemenace        : CPU time = 0 ms,  elapsed time = 0 ms
notulysses(NOT EXISTS): CPU time = 687 ms,  elapsed time = 679 ms.
like image 42
Felix Pamittan Avatar answered Sep 23 '22 01:09

Felix Pamittan


All the above answers are accurate alternatively if we can make an table like date dimension that has all sequential number starting from 1 to n.

insert into sequenceNumber 
select 1 union all 
select 2 union all 
select 3 union all 
select 4 union all 
select 5 union all 
select 6 union all 
select 7 union all 
select 8 union all 
select 9 union all 
select 10 union all 
select 11 union all 
select 12 union all 
select 13


select * from sequenceNumber

RESULT:

1 2 3 4 5 6 7 8 9 10 11 12 13

Now we can use below query to find out missing number.

select a.num,b.num 
from sequenceNumber a
left outer join tbl b on a.num = b.num
where b.num is null

Hope it will be useful.

like image 30
Pratik Sharma Avatar answered Sep 24 '22 01:09

Pratik Sharma