Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LINQ expression for shortest common prefix

Tags:

c#

linq

Can anyone help me with a nice LINQ expression for transforming a list of strings in another list containing only the shortest distinct common prefixes for the strings? The delimiter for prefixes is ..

Example: ["A", "A.B.D", "A", "A.B","E","F.E", "F","B.C"]

Goes to: ["A", "E", "F", "B.C"]

Removed:

  • "A.B.D" and "A.B" because the prefix "A" is already in the list
  • "A" because is duplicate
  • "F.E" because "F" already in list

Thanks!

like image 586
dave Avatar asked Nov 21 '10 19:11

dave


4 Answers

Here you go:

from set in
    (from item in list select item.Split('.')).GroupBy(x => x[0])
select
  set.First()
     .TakeWhile((part, index) => set.All(x => x.Length > index && x[index].Equals(part)))
     .Aggregate((x, y) => String.Format("{0}.{1}", x, y));

By way of explanation:

  1. First, we split all the strings by '.' and group by their first token.
  2. Then, we look at the first element of each grouping, and we take parts from it while every element of that group continues to match (TakeWhile).
  3. Then, we take all those parts and recompose them with the Aggregate(String.Format).
like image 192
jtdubs Avatar answered Sep 28 '22 23:09

jtdubs


    var items = new[] { "A", "A.B.D", "A", "A.B", "E", "F.E", "F", "B.C" };
    var result = items
        .OrderBy(s => s.Length)
        .Distinct()
        .ToLookup(s => s.Substring(0, 1))
        .Select(g => g.First());

Order the items by their length, call distinct to remove duplicates, convert to groupings based on the first character, and select the first item in each group.

Yields: "A", "E", "F", "B.C"

Edit: You probably don't even need Distinct as your selecting the first item in each group anyway, so it's really redundant.

like image 32
Matthew Abbott Avatar answered Sep 28 '22 21:09

Matthew Abbott


EDIT: thanks to the comments for pointing out a bug in my earlier approach.

To get around that shortcoming this query should work:

var list = new List<string> { "A.B.D", "A", "A.B","E","F.E", "F","B.C", "B.C.D" };
var result = list.OrderBy(s => s)
                 .GroupBy(s => s[0])
                 .Select(g => g.First());

foreach (var s in result)
{
    Console.WriteLine(s);
}

Incorrect approach:

The following query will group each string by the first character. Next, if the group count has more than one item the key is selected, otherwise the single item is selected.

var list = new List<string> { "A", "A.B.D", "A", "A.B", "E", "F.E", "F", "B.C" };
var result = list.GroupBy(s => s[0])
                 .Select(g => g.Count() > 1 ? g.Key.ToString() : g.Single());

foreach (var s in result)
{
    Console.WriteLine(s);
}
like image 39
Ahmad Mageed Avatar answered Sep 28 '22 23:09

Ahmad Mageed


Nailed it - assuming that if the source list contains "Q.X" & "Q.Y" then the result should contain "Q".

var source = new []
{
    "A", "A.B.D", "A",
    "A.B", "E", "F.E",
    "F", "B.C",
    "Q.X", "Q.Y",
    "D.A.A", "D.A.B",
};

Func<string, int> startsWithCount =
    s => source.Where(x => x.StartsWith(s)).Count();

var results =
    (from x in source.Distinct()
    let xx = x.Split('.')
    let splits = Enumerable
        .Range(1, xx.Length)
        .Select(n => String.Join(".", xx.Take(n)))
    let first = startsWithCount(splits.First())
    select splits
        .Where(s => startsWithCount(s) == first)
        .Last()
    ).Distinct();


// results == ["A", "E", "F", "B.C", "Q", "D.A"]
like image 44
Enigmativity Avatar answered Sep 28 '22 22:09

Enigmativity