Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Zip one IEnumerable with itself

I am implementing some math algorithms based on lists of points, like Distance, Area, Centroid, etc. Just like in this post: Find the distance required to navigate a list of points using linq

That post describes how to calculate the total distance of a sequence of points (taken in order) by essentially zipping the sequence "with itself", generating the sequence for Zip by offsetting the start position of the original IEnumerable by 1.

So, given the Zip extension in .Net 4.0, assuming Point for the point type, and a reasonable Distance formula, you can make calls like this to generate a sequence of distances from one point to the next and then to sum the distances:

var distances = points.Zip(points.Skip(1),Distance);
double totalDistance = distances.Sum();

Area and Centroid calculations are similar in that they need to iterate over the sequence, processing each pair of points (points[i] and points[i+1]). I thought of making a generic IEnumerable extension suitable for implementing these (and possibly other) algorithms that operate over sequences, taking two items at a time (points[0] and points[1], points[1] and points[2], ..., points[n-1] and points[n] (or is it n-2 and n-1 ...) and applying a function.

My generic iterator would have a similar signature to Zip, but it would not receive a second sequence to zip with as it is really just going to zip with itself.

My first try looks like this:

public static IEnumerable<TResult> ZipMyself<TSequence, TResult>(this IEnumerable<TSequence> seq, Func<TSequence, TSequence, TResult> resultSelector)
{
  return seq.Zip(seq.Skip(1),resultSelector);
}

Begin edit: After seeing the responses, I have implemented Pairwise with explicit use of the underlying Enumerator like this:

public static IEnumerable<TResult> Pairwise<TSequence, TResult>(this IEnumerable<TSequence> seq, Func<TSequence, TSequence, TResult> resultSelector)
{
  TSequence prev = default(TSequence);
  using (IEnumerator<TSequence> e = seq.GetEnumerator())
  {
    if (e.MoveNext()) prev = e.Current;

    while (e.MoveNext()) yield return resultSelector(prev, prev = e.Current);
  }
}

While certainly more complicated than my initial version, this one iterates through the input sequence once whereas the the original iterates twice.

End edit

With my generic iterator in place, I can write functions like this:

public static double Length(this IEnumerable<Point> points)
{
  return points.ZipMyself(Distance).Sum();
}

and call it like this:

double d = points.Length();

and

double GreensTheorem(Point p1, Point p1)
{
  return p1.X * p2.Y - p1.Y * p2.X;
}

public static double SignedArea(this IEnumerable<Point> points)
{
  return points.ZipMyself(GreensTheorem).Sum() / 2.0
}

public static double Area(this IEnumerable<Point> points)
{
  return Math.Abs(points.SignedArea());
}

public static bool IsClockwise(this IEnumerable<Point> points)
{
  return SignedArea(points) < 0;
}

and call them like this:

double a = points.Area();
bool isClockwise = points.IsClockwise();

In this case, is there any reason NOT to implement "ZipMyself" in terms of Zip and Skip(1)? Is there already something in LINQ that automates this (zipping a list with itself) - not that it needs to be made that much easier ;-)

Also, is there better name for the extension that might reflect that it is a well-known pattern (if, indeed it is a well-known pattern)?

Had a link here for a StackOverflow question about area calculation. It is question 2432428.

Also had a link to Wikipedia article on Centroid. Just go to Wikipedia and search for Centroid if interested.

Just starting out, so don't have enough rep to post more than one link.

Begin edit

For completeness, if anyone gets here after searching for Distance, Area, or Centroid, here are my functions that accept a list of Position types (assumed closed for Area and Centroid) and return the Distance(along), Area, and Centroid of the Positions:

public struct Position
{
  public double X;
  public double Y;

  static public double Distance(Position p1, Position p2)
  {
    double dx = p2.X - p1.X;
    double dy = p2.Y - p1.Y;
    return Math.Sqrt(dx*dx + dy*dy);
  }
}

public static class PointMath
{
  public static double Distance(IEnumerable<Position> pts)
  {
    return pts.Pairwise((p1, p2) => Position.Distance(p1, p2)).Sum();
  }

  private static bool IsClockwise(IEnumerable<Position> pts)
  {
    return SignedArea(pts) < 0;
  }

  private static double SignedArea(IEnumerable<Position> pts)
  {
    return pts.Pairwise((p1, p2) => (p1.X * p2.Y - p1.Y * p2.X)).Sum() / 2.0;
  }

  public static double Area(IEnumerable<Position> pts)
  {
    return Math.Abs(SignedArea(pts));
  }

  public static Position Centroid(IEnumerable<Position> pts)
  {
    double a = SignedArea(pts);

    var  c = pts.Pairwise((p1, p2) => new 
                                      { 
                                        x = (p1.X + p2.X) * (p1.X * p2.Y - p2.X * p1.Y), 
                                        y = (p1.Y + p2.Y) * (p1.X * p2.Y - p2.X * p1.Y)   
                                      })
                .Aggregate((t1, t2) => new 
                                       { 
                                         x = t1.x + t2.x, 
                                         y = t1.y + t2.y 
                                       });

    return new Position(1.0 / (a * 6.0) * c.x, 1.0 / (a * 6.0) * c.y);
  }
}

Feel free to comment.

End edit

like image 351
wageoghe Avatar asked May 04 '10 20:05

wageoghe


2 Answers

Also, is there better name for the extension that might reflect that it is a well-known pattern (if, indeed it is a well-known pattern)?

Yes - it is also known as Pairwise. It has been done before, for example here. There also has been a question about it before here on SO.

Pairwise can now be implemented in terms of Zip for .NET 4.0, as you point out. This seems like a reasonable approach for a LINQ to Objects solution, although having a version that works on .NET v3.5 too is probably more useful to a wider audience at this point.

like image 135
Mark Byers Avatar answered Oct 19 '22 14:10

Mark Byers


When I did something similar, I called it SelectWithPrevious and had a version that had overloads for both "SelectWithPreviousItem" (took a Func<TSource, TSource, TResult>) and "SelectWithPreviousResult" (took a Func<TResult, TSource, TResult>).

Also, I implemented it by directly storing the last element rather than iterating the sequence twice like the Zip approach does. Having never used LINQ-to-SQL, I can't say for sure, but I wonder if the Zip/Skip approach makes two trips to the server to evaluate a query twice.

like image 21
Gideon Engelberth Avatar answered Oct 19 '22 14:10

Gideon Engelberth