Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Run Length Encoding in Python with List Comprehension

I have a more basic Run Length Encoding question compared to many of the questions about this topic that have already been answered. Essentially, I'm trying to take the string

string = 'aabccccaaa'

and have it return

a2b1c4a3

I thought that if I can manage to get all the information into a list like I have illustrated below, I would easily be able to return a2b1c4a3

test = [['a','a'], ['b'], ['c','c','c','c'], ['a','a','a']]

I came up with the following code so far, but was wondering if someone would be able to help me figure out how to make it create the output I illustrated above.

def string_compression():
    for i in xrange(len(string)):
        prev_item, current_item = string[i-1], string[i]
        print prev_item, current_item
        if prev_item == current_item:
            <HELP>

If anyone has any additional comments regarding more efficient ways to go about solving a question like this I am all ears!

like image 531
g.humpkins Avatar asked Aug 27 '14 21:08

g.humpkins


1 Answers

You can use itertools.groupby():

from itertools import groupby

grouped = [list(g) for k, g in groupby(string)]

This will produce your per-letter groups as a list of lists.

You can turn that into a RLE in one step:

rle = ''.join(['{}{}'.format(k, sum(1 for _ in g)) for k, g in groupby(string)])

Each k is the letter being grouped, each g an iterator producing N times the same letter; the sum(1 for _ in g) expression counts those in the most efficient way possible.

Demo:

>>> from itertools import groupby
>>> string = 'aabccccaaa'
>>> [list(g) for k, g in groupby(string)]
[['a', 'a'], ['b'], ['c', 'c', 'c', 'c'], ['a', 'a', 'a']]
>>> ''.join(['{}{}'.format(k, sum(1 for _ in g)) for k, g in groupby(string)])
'a2b1c4a3'
like image 175
Martijn Pieters Avatar answered Oct 01 '22 21:10

Martijn Pieters