Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tricky string transformation (hopefully) in LINQ

Tags:

string

c#

linq

I'm hoping for a concise way to perform the following transformation. I want to transform song lyrics. The input will look something like this:

Verse 1 lyrics line 1
Verse 1 lyrics line 2
Verse 1 lyrics line 3
Verse 1 lyrics line 4

Verse 2 lyrics line 1
Verse 2 lyrics line 2
Verse 2 lyrics line 3
Verse 2 lyrics line 4

And I want to transform them so the first line of each verse is grouped together as in:

Verse 1 lyrics line 1
Verse 2 lyrics line 1

Verse 1 lyrics line 2
Verse 2 lyrics line 2

Verse 1 lyrics line 3
Verse 2 lyrics line 3

Verse 1 lyrics line 4
Verse 2 lyrics line 4

Lyrics will obviously be unknown, but the blank line marks a division between verses in the input.

like image 424
Larsenal Avatar asked Mar 21 '10 03:03

Larsenal


3 Answers

I have a few extension methods I always keep around that make this type of processing very simple. The solution in its entirety is going to be longer than others, but these are useful methods to have around, and once you have the extension methods in place then the answer is very short and easy-to-read.

First, there's a Zip method that takes an arbitrary number of sequences:

public static class EnumerableExtensions
{
    public static IEnumerable<T> Zip<T>(
        this IEnumerable<IEnumerable<T>> sequences,
        Func<IEnumerable<T>, T> aggregate)
    {
        var enumerators = sequences.Select(s => s.GetEnumerator()).ToArray();
        try
        {
            while (enumerators.All(e => e.MoveNext()))
            {

                var items = enumerators.Select(e => e.Current);
                yield return aggregate(items);
            }
        }
        finally
        {
            foreach (var enumerator in enumerators)
            {
                enumerator.Dispose();
            }
        }
    }
}

Then there's a Split method which does roughly the same thing to an IEnumerable<T> that string.Split does to a string:

public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items,
    Predicate<T> splitCondition)
{
    using (IEnumerator<T> enumerator = items.GetEnumerator())
    {
        while (enumerator.MoveNext())
        {
            yield return GetNextItems(enumerator, splitCondition).ToArray();
        }
    }
}

private static IEnumerable<T> GetNextItems<T>(IEnumerator<T> enumerator,
    Predicate<T> stopCondition)
{
    do
    {
        T item = enumerator.Current;
        if (stopCondition(item))
        {
            yield break;
        }
        yield return item;
    } while (enumerator.MoveNext());
}

Once you have these extensions in place, solving the song-lyric problem is a piece of cake:

string lyrics = ...
var verseGroups = lyrics
    .Split(new[] { Environment.NewLine }, StringSplitOptions.None)
    .Select(s => s.Trim())  // Optional, if there might be whitespace
    .Split(s => string.IsNullOrEmpty(s))
    .Zip(seq => string.Join(Environment.NewLine, seq.ToArray()))
    .Select(s => s + Environment.NewLine);  // Optional, add space between groups
like image 61
Aaronaught Avatar answered Nov 01 '22 02:11

Aaronaught


LINQ is so sweet... I just love it.

static void Main(string[] args)
{
    var lyrics = @"Verse 1 lyrics line 1 
                   Verse 1 lyrics line 2 
                   Verse 1 lyrics line 3 
                   Verse 1 lyrics line 4 

                   Verse 2 lyrics line 1 
                   Verse 2 lyrics line 2 
                   Verse 2 lyrics line 3 
                   Verse 2 lyrics line 4";
    var x = 0;
    var indexed = from lyric in lyrics.Split(new[] { Environment.NewLine },
                                             StringSplitOptions.None)
                  let line = lyric.Trim()
                  let indx = line == string.Empty ? x = 0: ++x
                  where line != string.Empty
                  group line by indx;

    foreach (var trans in indexed)
    {
        foreach (var item in trans)
            Console.WriteLine(item);
        Console.WriteLine();
    }
    /*
        Verse 1 lyrics line 1
        Verse 2 lyrics line 1

        Verse 1 lyrics line 2
        Verse 2 lyrics line 2

        Verse 1 lyrics line 3
        Verse 2 lyrics line 3

        Verse 1 lyrics line 4
        Verse 2 lyrics line 4
     */
}
like image 1
Matthew Whited Avatar answered Nov 01 '22 04:11

Matthew Whited


There is probably a more concise way to do this, but here's one solution that works given valid input:

        var output = String.Join("\r\n\r\n", // join it all in the end
        Regex.Split(input, "\r\n\r\n") // split on blank lines
            .Select(v => Regex.Split(v, "\r\n")) // now split lines in each verse
            .SelectMany(vl => vl.Select((lyrics, i) => new { Line = i, Lyrics = lyrics })) // flatten things out, but attach line number
            .GroupBy(b => b.Line).Select(c => new { Key = c.Key, Value = c }) // group by line number
            .Select(e => String.Join("\r\n", e.Value.Select(f => f.Lyrics).ToArray())).ToArray());

Obviously this is pretty ugly. Not at all a suggestion for production code.

like image 1
Larsenal Avatar answered Nov 01 '22 03:11

Larsenal