Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select last value from each run of similar items?

I have a list. I'd like to take the last value from each run of similar elements.

What do I mean? Let me give a simple example. Given the list of words

['golf', 'hip', 'hop', 'hotel', 'grass', 'world', 'wee']

And the similarity function 'starting with the same letter', the function would return the shorter list

['golf', 'hotel', 'grass', 'wee']

Why? The original list has a 1-run of G words, a 3-run of H words, a 1-run of G words, and a 2-run of W words. The function returns the last word from each run.

How can I do this?


Hypothetical C# syntax (in reality I'm working with customer objects but I wanted to share something you could run and test yourself)

> var words = new List<string>{"golf", "hip", "hop", "hotel", "grass", "world", "wee"};
> words.LastDistinct(x => x[0])
["golf", "hotel", "grass", "wee"]

Edit: I tried .GroupBy(x => x[0]).Select(g => g.Last()) but that gives ['grass', 'hotel', 'wee'] which is not what I want. Read the example carefully.


Edit. Another example.

['apples', 'armies', 'black', 'beer', 'bastion', 'cat', 'cart', 'able', 'art', 'bark']

Here there are 5 runs (a run of A's, a run of B's, a run of C's, a new run of A's, a new run of B's). The last word from each run would be:

['armies', 'bastion', 'cart', 'art', 'bark']

The important thing to understand is that each run is independent. Don't mix-up the run of A's at the start with the run of A's near the end.

like image 986
Colonel Panic Avatar asked Nov 27 '13 11:11

Colonel Panic


People also ask

How do you extract the last number in Excel?

If you want to extract the last n characters, like the last 3 characters, type this formula = RIGHT(E1, 3).

How do you find the last occurrence of a character in a string in Excel?

You can use any character you want. Just make sure it's unique and doesn't appear in the string already. FIND(“@”,SUBSTITUTE(A2,”/”,”@”,LEN(A2)-LEN(SUBSTITUTE(A2,”/”,””))),1) – This part of the formula would give you the position of the last forward slash.


2 Answers

There's nothing too complicated with just doing it the old-fashioned way:

Func<string, object> groupingFunction = s => s.Substring(0, 1);
IEnumerable<string> input = new List<string>() {"golf", "hip", "..." };

var output = new List<string>();

if (!input.Any())
{
    return output;
}

var lastItem = input.First();
var lastKey = groupingFunction(lastItem);
foreach (var currentItem in input.Skip(1))
{
    var currentKey = groupingFunction(str);
    if (!currentKey.Equals(lastKey))
    {
        output.Add(lastItem);
    }
    lastKey = currentKey;
    lastItem = currentItem;
}

output.Add(lastItem);

You could also turn this into a generic extension method as Tim Schmelter has done; I have already taken a couple steps to generalize the code on purpose (using object as the key type and IEnumerable<T> as the input type).

like image 88
Jon Avatar answered Oct 07 '22 13:10

Jon


You could use this extension that can group by adjacent/consecutive elements:

public static IEnumerable<IGrouping<TKey, TSource>> GroupAdjacent<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector)
{
    TKey last = default(TKey);
    bool haveLast = false;
    List<TSource> list = new List<TSource>();
    foreach (TSource s in source)
    {
        TKey k = keySelector(s);
        if (haveLast)
        {
            if (!k.Equals(last))
            {
                yield return new GroupOfAdjacent<TSource, TKey>(list, last);
                list = new List<TSource>();
                list.Add(s);
                last = k;
            }
            else
            {
                list.Add(s);
                last = k;
            }
        }
        else
        {
            list.Add(s);
            last = k;
            haveLast = true;
        }
    }
    if (haveLast)
        yield return new GroupOfAdjacent<TSource, TKey>(list, last);
}

public class GroupOfAdjacent<TSource, TKey> : IEnumerable<TSource>, IGrouping<TKey, TSource>
{
    public TKey Key { get; set; }
    private List<TSource> GroupList { get; set; }
    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return ((System.Collections.Generic.IEnumerable<TSource>)this).GetEnumerator();
    }
    System.Collections.Generic.IEnumerator<TSource> System.Collections.Generic.IEnumerable<TSource>.GetEnumerator()
    {
        foreach (var s in GroupList)
            yield return s;
    }
    public GroupOfAdjacent(List<TSource> source, TKey key)
    {
        GroupList = source;
        Key = key;
    }
}

Then it's easy:

var words = new List<string>{"golf", "hip", "hop", "hotel", "grass", "world", "wee"};
IEnumerable<string> lastWordOfConsecutiveFirstCharGroups = words
            .GroupAdjacent(str => str[0])
            .Select(g => g.Last());

Output:

string.Join(",", lastWordOfConsecutiveFirstCharGroups); // golf,hotel,grass,wee

Your other sample:

words=new List<string>{"apples", "armies", "black", "beer", "bastion", "cat", "cart", "able", "art", "bark"};
lastWordOfConsecutiveFirstCharGroups = words
   .GroupAdjacent(str => str[0])
   .Select(g => g.Last());

Output:

string.Join(",", lastWordOfConsecutiveFirstCharGroups); // armies,bastion,cart,art,bark

Demonstration

like image 24
Tim Schmelter Avatar answered Oct 07 '22 13:10

Tim Schmelter