Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Run-length encoding of a given sorted string

Write code for run -length encoding of a given string
Sample Input: aaaaaaaaaabcccccc
Output: a10bc6

My code:

static void Main(string[] args)
{
    string str = "aaaaaaaaaabcccccc";
    var qry = (from c in str
               group c by c into grp
               select new
               {
                   output = grp.Key.ToString() + grp.Count().ToString()
               });
    StringBuilder sb = new StringBuilder();
    foreach (var item in qry)
    {
        sb.Append(item.output);
    }
    Console.WriteLine(sb.ToString());
    Console.ReadLine();
}

However it returns:

a10b1c6

I want to remove the count for non-repeating char, here is "1" for letter 'b'.

Assume that it is a sorted string.

like image 249
Hui Zhao Avatar asked Dec 19 '14 20:12

Hui Zhao


2 Answers

add a ternary expression:

output = grp.Key + (grp.Count() > 1 ? grp.Count().ToString() : "")
like image 186
Jonesopolis Avatar answered Sep 28 '22 16:09

Jonesopolis


Although OP did mention as an afterthought that in his/her case his source string was sorted, in general, the input to Run Length encoding won't be sorted as will lose information and can't be decompressed. Here's a take on the more general case of unsorted:

  string str = "aaaaaaaabccccccaadddddaaa"; // a8bc6a2d5a3

  // Zip the string with itself, offset by 1 character. 
  // Duplicate the last char to make strings equal length
  var pairs = str
    .Zip((str + str.Last()).Skip(1),
         (prev, current) => new { prev, current });

  // Retain a horrid mutable sequence which tracks consecutive characters
  var sequence = 0;
  var grps = pairs.GroupBy(p => 
    new { Ch = p.prev, 
          Sequence = p.current == p.prev
          ? sequence 
          : sequence++});

  // Join this together, using the other solutions to drop the count from single chars
  var rle = String.Join("", 
    grps.Select(g => g.Count() > 1
        ? g.Key.Ch.ToString() + g.Count().ToString() 
        : g.Key.Ch.ToString()));
  Console.WriteLine(rle);

Edit
I guess the number comments indicate some violation of the POLA which require explanation:

  • The string is Zipped with itself offset by one (Skip), in order to detect the boundaries of consecutive characters
  • Since Zip stops on the shortest enumeration, the last character is repeated on the shortest string to handle the final character in the string.
  • Unlike the 'sorted' RLE input string in the other answers, the Grouping key is done by the combination of character and a 'are the characters adjacent?' sequencer.
  • This sequence is rather horridly incremented within a conditional in the projection lambda of the GroupBy
  • @Jonesy's / @Tim's conditional join is used within a String.Join to reassemble the final encoded string.
like image 41
StuartLC Avatar answered Sep 28 '22 17:09

StuartLC