Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding continuous ranges in a set of numbers

I have a reasonably large set of phone numbers (approximately 2 million) in a database table. These numbers have been inserted in blocks, so there are many continuous ranges of numbers, anything from 10 numbers to 10 thousand in a range. Some of these numbers are in use and are therefore marked as unavailable, the rest are available. Given a particular number I need a way to find continuous ranges of numbers, both above and below that number. The range should continue until it finds an unavailable number, or encounters the boundary of two ranges.

For example given the following set:

1000
1001
1002
1010
1011
1012
1013
1020
1021
1022

Doing a search using 1012 as the parameter should return 1010, 1011, 1012, 1013.

What is a good way of forming a query to find these ranges? We use NHibernate on top of SQL server, a solution using either is fine.

like image 242
Jack Ryan Avatar asked Jun 29 '10 09:06

Jack Ryan


2 Answers

Theoretically the items in a set have no particular value, so I'm assuming you also have some continuous ID column that defines the order of the numbers. Something like this:

ID  Number
1   1000
2   1001
3   1002
4   1010
5   1011
6   1012
7   1013
8   1020
9   1021
10  1022

You could create an extra column that contains the result of Number - ID:

ID  Number  Diff
1   1000    999
2   1001    999
3   1002    999
4   1010    1006
5   1011    1006
6   1012    1006
7   1013    1006
8   1020    1012
9   1021    1012
10  1022    1012

Numbers in the same range will have the same result in the Diff column.

like image 60
Niels van der Rest Avatar answered Nov 09 '22 13:11

Niels van der Rest


SQL can't really do this in a single query (except there are native SQL enhancements I don't know about), because SQL can't access the row 'before' or 'after'.

You need to go through the sequence in a loop.

You may try NHibernates Enumerable, which doesn't load the entities into memory, but only creates proxies of them. Actually I don't think that it is a good idea, because it will create proxies for the whole 2 million numbers.

Plan B, use paging. Roughly, it looks like this:

List<PhoneNumber> result = new List<PhoneNumber>();

int input = 1012;
int pageSize = 100;
int currentPage = 0;
int expectedNumber = input;

bool carryOn = true;

while(carryOn)
{
  var numbers = session
    .CreateQuery("from PhoneNumber pn where pn.Number > :input")
    .SetInt("input", input)
    .SetFirstResult(currentPage * pageSize)
    .SetMaxResult(pageSize)
    .List<PhoneNumbers>();

  foreach(var number in numbers)
  {
    expectNumber++;
    if (number.Number != expectedNumber) 
    {
      carryOn = false;
      break;
    }
    result.Add(number);
  }

  currentPage++;
}

And the same for the range before in the other direction.

like image 41
Stefan Steinegger Avatar answered Nov 09 '22 14:11

Stefan Steinegger