I am trying to group a list of DTOs which contain alternate family pairs to group them in the following format to minimize duplication.
Here is the DTO structure which I have currently which has duplicate rows as you can see which can be grouped together based on reverse relation also.
+----------+------------+-----------+
| PersonId | RelativeId | Relation |
+----------+------------+-----------+
| 1 | 2 | "Son" |
| 2 | 1 | "Father" |
| 1 | 3 | "Mother" |
| 3 | 1 | "Son" |
| 2 | 3 | "Husband" |
| 3 | 2 | "Wife" |
+----------+------------+-----------+
into something like this:
+----------+------------+-----------+-----------------+
| PersonId | RelativeId | Relation | ReverseRelation |
+----------+------------+-----------+-----------------+
| 1 | 2 | "Son" | "Father" |
| 1 | 3 | "Mother" | "Son" |
| 2 | 3 | "Husband" | "Wife" |
+----------+------------+-----------+-----------------+
Code which I am trying:
Program.cs
class Program
{
static void Main(string[] args)
{
List<RelationDTO> relationDTOList = new List<RelationDTO>
{
new RelationDTO { PersonId = 1, RelativeId = 2, Relation = "Son" },
new RelationDTO { PersonId = 2, RelativeId = 1, Relation = "Father" },
new RelationDTO { PersonId = 1, RelativeId = 3, Relation = "Mother" },
new RelationDTO { PersonId = 3, RelativeId = 1, Relation = "Son" },
new RelationDTO { PersonId = 2, RelativeId = 3, Relation = "Husband" },
new RelationDTO { PersonId = 3, RelativeId = 2, Relation = "Wife" },
};
var grp = relationDTOList.GroupBy(x => new { x.PersonId }).ToList();
}
}
RelationDTO.cs
public class RelationDTO
{
public int PersonId { get; set; }
public int RelativeId { get; set; }
public string Relation { get; set; }
}
Relations.cs
public class Relations
{
public int PersonId { get; set; }
public int RelativeId { get; set; }
public string Relation { get; set; }
public string ReverseRelation { get; set; }
}
GroupBy allows you to quickly group collections of related data by specific properties on your data. The grouped data is then arranged by sub-collections of items in those groups. Note: LINQ provides variants of each method in this article that work with either IEnumerable or IQueryable .
Language-Integrated Query (LINQ) is the name for a set of technologies based on the integration of query capabilities directly into the C# language. Traditionally, queries against data are expressed as simple strings without type checking at compile time or IntelliSense support.
You can use a join operation like
var result = relationDTOList
.Where(v => v.PersonId < v.RelativeId)
.Join(
relationDTOList.Where(v => v.PersonId > v.RelativeId),
v => new Key{PersonId = v.PersonId, RelativeId = v.RelativeId},
v => new Key{PersonId = v.RelativeId, RelativeId = v.PersonId},
(p, q) => new Relations
{
PersonId = p.PersonId,
RelativeId = p.RelativeId,
Relation = p.Relation,
ReverseRelation = q.Relation
}
);
The Key
is:
public struct Key
{
public int PersonId { get; set; }
public int RelativeId { get; set; }
}
I'm not sure whether it is what you need:
public static void Main()
{
List<RelationDTO> relationDTOList = new List<RelationDTO>
{
new RelationDTO { PersonId = 1, RelativeId = 2, Relation = "Son" },
new RelationDTO { PersonId = 2, RelativeId = 1, Relation = "Father" },
new RelationDTO { PersonId = 1, RelativeId = 3, Relation = "Mother" },
new RelationDTO { PersonId = 3, RelativeId = 1, Relation = "Son" },
new RelationDTO { PersonId = 2, RelativeId = 3, Relation = "Husband" },
new RelationDTO { PersonId = 3, RelativeId = 2, Relation = "Wife" },
};
var grp = relationDTOList.Join(relationDTOList,
dto => dto.PersonId + "-" + dto.RelativeId,
dto => dto.RelativeId + "-" + dto.PersonId,
(dto1, dto2) => new Relations
{
PersonId = dto1.PersonId,
RelationId = dto1.RelativeId,
Relation = dto1.Relation,
ReverseRelation = dto2.Relation
}).Distinct(new MyEqualityComparer());
foreach (var g in grp)
Console.WriteLine("{0},{1},{2},{3}", g.PersonId, g.RelationId, g.Relation, g.ReverseRelation);
}
public class MyEqualityComparer : IEqualityComparer<Relations>
{
public bool Equals(Relations x, Relations y)
{
return x.PersonId + "-" + x.RelationId == y.PersonId + "-" + y.RelationId ||
x.PersonId + "-" + x.RelationId == y.RelationId + "-" + y.PersonId;
}
public int GetHashCode(Relations obj)
{
return 0;
}
}
I doubt a bit that LINQ is the best choice here as a loop with lookup might be a bit more efficient. However if you really need LINQ, then you could do the following
var relations = from person in relationDTOList
// Match on the exact pair of IDs
join relative in relationDTOList on
new { person.PersonId, person.RelativeId } equals
new { PersonId = relative.RelativeId, RelativeId = relative.PersonId }
// Build the new structure
let relation = new Relations {
PersonId = person.PersonId,
Relation = person.Relation,
RelativeId = relative.PersonId,
ReverseRelation = relative.Relation
}
// Order the pairs to find the duplicates
let ids = new[] {person.PersonId, relative.PersonId}.OrderBy(x => x).ToArray()
group relation by new { FirstPersonId = ids[0], SecondPersonId = ids[1] }
into relationGroups
// Select only the the first of two duplicates
select relationGroups.First();
What this code does is joins the collection with itself on the matching pairs PersonId
, RelativeId
and then filters out the second record of each pair thus resulting in a collection where the first person found in the list will be considered as parent in the relation.
EDIT: The lookup method I was talking about:
var result = new List<Relations>();
while (relationDTOList.Any())
{
var person = relationDTOList.First();
relationDTOList.RemoveAt(0);
var relative = relationDTOList.Where(x =>
x.PersonId == person.RelativeId && x.RelativeId == person.PersonId)
.Select((x, i) => new {Person = x, Index = i}).FirstOrDefault();
if (relative != null)
{
relationDTOList.RemoveAt(relative.Index);
result.Add(new Relations {
PersonId = person.PersonId,
Relation = person.Relation,
RelativeId = relative.Person.PersonId,
ReverseRelation = relative.Person.Relation
});
}
}
As a note, it empties your original list so you have to make a copy (list.ToList()
) if you need it further in your code.
Running this code turned out to be about six times faster than the method with join
I posted before. I also came up with the following grouping method which runs much faster than the join, however it's still slower than the lookup-and-remove method although they do a very similar thing.
var relations = relationDTOList.GroupBy(person =>
person.PersonId < person.RelativeId
? new {FirstPersonId = person.PersonId, SecondPersonId = person.RelativeId}
: new {FirstPersonId = person.RelativeId, SecondPersonId = person.PersonId})
.Select(group => new Relations {
PersonId = group.First().PersonId,
Relation = group.First().Relation,
RelativeId = group.First().RelativeId,
ReverseRelation = group.Last().Relation
});
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With