Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detecting sequence of at least 3 sequential numbers from a given list

Tags:

I have a list of numbers e.g. 21,4,7,9,12,22,17,8,2,20,23

I want to be able to pick out sequences of sequential numbers (minimum 3 items in length), so from the example above it would be 7,8,9 and 20,21,22,23.

I have played around with a few ugly sprawling functions but I am wondering if there is a neat LINQ-ish way to do it.

Any suggestions?

UPDATE:

Many thanks for all the responses, much appriciated. Im am currently having a play with them all to see which would best integrate into our project.

like image 510
David Avatar asked Oct 02 '10 06:10

David


People also ask

How do you check if the numbers in a list are consecutive numbers?

We test both conditions, first by finding the iterative difference of the sorted list np. diff(sorted(l)) we can test if there are n consecutive integers. Lastly, we test if the value_counts() are all 1, indicating no repeats. Show activity on this post.

What are the sequential numbers?

Sequential Numbering is a popular feature on custom-printed forms Sequential Numbering, also known as Consecutive Numbering, refers to the printing of ascending or descending identification numbers so that each printed unit receives its own unique number.


2 Answers

It strikes me that the first thing you should do is order the list. Then it's just a matter of walking through it, remembering the length of your current sequence and detecting when it's ended. To be honest, I suspect that a simple foreach loop is going to be the simplest way of doing that - I can't immediately think of any wonderfully neat LINQ-like ways of doing it. You could certainly do it in an iterator block if you really wanted to, but bear in mind that ordering the list to start with means you've got a reasonably "up-front" cost anyway. So my solution would look something like this:

var ordered = list.OrderBy(x => x); int count = 0; int firstItem = 0; // Irrelevant to start with foreach (int x in ordered) {     // First value in the ordered list: start of a sequence     if (count == 0)     {         firstItem = x;         count = 1;     }     // Skip duplicate values     else if (x == firstItem + count - 1)     {         // No need to do anything     }     // New value contributes to sequence     else if (x == firstItem + count)     {         count++;     }     // End of one sequence, start of another     else     {         if (count >= 3)         {             Console.WriteLine("Found sequence of length {0} starting at {1}",                               count, firstItem);         }         count = 1;         firstItem = x;     } } if (count >= 3) {     Console.WriteLine("Found sequence of length {0} starting at {1}",                       count, firstItem); } 

EDIT: Okay, I've just thought of a rather more LINQ-ish way of doing things. I don't have the time to fully implement it now, but:

  • Order the sequence
  • Use something like SelectWithPrevious (probably better named SelectConsecutive) to get consecutive pairs of elements
  • Use the overload of Select which includes the index to get tuples of (index, current, previous)
  • Filter out any items where (current = previous + 1) to get anywhere that counts as the start of a sequence (special-case index=0)
  • Use SelectWithPrevious on the result to get the length of the sequence between two starting points (subtract one index from the previous)
  • Filter out any sequence with length less than 3

I suspect you need to concat int.MinValue on the ordered sequence, to guarantee the final item is used properly.

EDIT: Okay, I've implemented this. It's about the LINQiest way I can think of to do this... I used null values as "sentinel" values to force start and end sequences - see comments for more details.

Overall, I wouldn't recommend this solution. It's hard to get your head round, and although I'm reasonably confident it's correct, it took me a while thinking of possible off-by-one errors etc. It's an interesting voyage into what you can do with LINQ... and also what you probably shouldn't.

Oh, and note that I've pushed the "minimum length of 3" part up to the caller - when you have a sequence of tuples like this, it's cleaner to filter it out separately, IMO.

using System; using System.Collections.Generic; using System.Linq;  static class Extensions {     public static IEnumerable<TResult> SelectConsecutive<TSource, TResult>         (this IEnumerable<TSource> source,          Func<TSource, TSource, TResult> selector)     {         using (IEnumerator<TSource> iterator = source.GetEnumerator())         {            if (!iterator.MoveNext())            {                yield break;            }            TSource prev = iterator.Current;            while (iterator.MoveNext())            {                TSource current = iterator.Current;                yield return selector(prev, current);                prev = current;            }         }     } }  class Test {     static void Main()     {         var list = new List<int> {  21,4,7,9,12,22,17,8,2,20,23 };          foreach (var sequence in FindSequences(list).Where(x => x.Item1 >= 3))         {             Console.WriteLine("Found sequence of length {0} starting at {1}",                               sequence.Item1, sequence.Item2);         }     }      private static readonly int?[] End = { null };      // Each tuple in the returned sequence is (length, first element)     public static IEnumerable<Tuple<int, int>> FindSequences          (IEnumerable<int> input)     {         // Use null values at the start and end of the ordered sequence         // so that the first pair always starts a new sequence starting         // with the lowest actual element, and the final pair always         // starts a new one starting with null. That "sequence at the end"         // is used to compute the length of the *real* final element.         return End.Concat(input.OrderBy(x => x)                                .Select(x => (int?) x))                   .Concat(End)                   // Work out consecutive pairs of items                   .SelectConsecutive((x, y) => Tuple.Create(x, y))                   // Remove duplicates                   .Where(z => z.Item1 != z.Item2)                   // Keep the index so we can tell sequence length                   .Select((z, index) => new { z, index })                   // Find sequence starting points                   .Where(both => both.z.Item2 != both.z.Item1 + 1)                   .SelectConsecutive((start1, start2) =>                         Tuple.Create(start2.index - start1.index,                                      start1.z.Item2.Value));     } } 
like image 125
Jon Skeet Avatar answered Oct 20 '22 15:10

Jon Skeet


Jon Skeet's / Timwi's solutions are the way to go.

For fun, here's a LINQ query that does the job (very inefficiently):

var sequences = input.Distinct()                      .GroupBy(num => Enumerable.Range(num, int.MaxValue - num + 1)                                                .TakeWhile(input.Contains)                                                .Last())  //use the last member of the consecutive sequence as the key                      .Where(seq => seq.Count() >= 3)                      .Select(seq => seq.OrderBy(num => num)); // not necessary unless ordering is desirable inside each sequence. 

The query's performance can be improved slightly by loading the input into a HashSet (to improve Contains), but that will still not produce a solution that is anywhere close to efficient.

The only bug I am aware of is the possibility of an arithmetic overflow if the sequence contains negative numbers of large magnitude (we cannot represent the count parameter for Range). This would be easy to fix with a custom static IEnumerable<int> To(this int start, int end) extension-method. If anyone can think of any other simple technique of dodging the overflow, please let me know.

EDIT: Here's a slightly more verbose (but equally inefficient) variant without the overflow issue.

var sequences = input.GroupBy(num => input.Where(candidate => candidate >= num)                                           .OrderBy(candidate => candidate)                                           .TakeWhile((candidate, index) => candidate == num + index)                                           .Last())                      .Where(seq => seq.Count() >= 3)                      .Select(seq => seq.OrderBy(num => num)); 
like image 33
Ani Avatar answered Oct 20 '22 14:10

Ani