Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to collect data from a list into groups based on condition?

Not sure how to title this question. I've run into a few situations where I have a list of data, maybe annotated with some property, and I want to collect them into groups.

For example, maybe I have a file like this:

some event
reading: 25.4
reading: 23.4
reading: 25.1
different event
reading: 22.3
reading: 21.1
reading: 26.0
reading: 25.2
another event
reading: 25.5
reading: 25.1

and I want to group each set of readings, splitting them on a condition (in this case, an event happening) so that I end up with a structure like

[['some event',
  'reading: 25.4',
  'reading: 23.4',
  'reading: 25.1'],
 ['different event',
  'reading: 22.3',
  'reading: 21.1',
  'reading: 26.0',
  'reading: 25.2'],
 ['another event',
  'reading: 25.5',
  'reading: 25.1']]

In it's generic form, it is: Look for a condition, collect the data until that condition is true again, repeat

Right now, I'd do something like

events = []
current_event = []

for line in lines:
    if is_event(line):
        if current_event:
            events.append(current_event)
        current_event = [line]

    else:
        current_event.append(line)
else:
    if current_event:
        events.append(current_event)


def is_event(line):
    return 'event' in line

which produces what I want, but it's ugly and hard to understand. I'm fairly certain there has to be a better way

My guess is that it involves some itertools wizardry, but I'm new to itertools and can't quite wrap my head around all of it.

Thanks!

Update

I've actually gone with Steve Jessop's answer with a Grouper class. Here's what I'm doing:

class Grouper(object):
    def __init__(self, condition_function):
        self.count = 0
        self.condition_function = condition_function

    def __call__(self, line):
        if self.condition_function(line):
            self.count += 1
        return self.count

and then using it like

event_grouper = Grouper(is_event)
result_as_iterators = (x[1] for x in itertools.groupby(lines, event_grouper))

and then to turn it into a dictionary I do

event_dictionary = [{event: readings} for event, *readings in result_as_iterators]

which gives

[
 {'some event': ['reading: 25.4', 'reading: 23.4', 'reading: 25.1']},
 {'different event': ['reading: 22.3','reading: 21.1','reading: 26.0','reading: 25.2']},
 {'another event': ['reading: 25.5', 'reading: 25.1']}
]
like image 494
Joe Pinsonault Avatar asked Dec 18 '13 18:12

Joe Pinsonault


2 Answers

I doubt itertools (or collections) can make it clearer than this, unless the exact pattern is implemented in there somewhere.

Two things I notice:

  • You always have a current event (since the first line is an event)
  • You always append the line to the current event (so the event itself is always current_event[0]

So you can skip the checking for if you have a current event, and you don't have to special-case creating it either. Additionally, since the "current" event is always the last one, we can just use a negative index to jump straight to it:

events = []

for line in lines:
    if is_event(line):
        events.append([])
    events[-1].append(line)

def is_event(line):
    return 'event' in line
like image 62
Izkata Avatar answered Sep 30 '22 16:09

Izkata


With itertools.groupby, you can easily group things based on a key, like 'event' in line. So, as a first step:

>>> for k, g in itertools.groupby(lines, lambda line: 'event' in line):
...     print(k, list(g))

Of course this doesn't put the events together with their values. I suspect you really don't want the events together with their values, but would actually prefer to have a dict of event: [values] or a list of (event, [values]). In which case you're nearly done. For example, to get that dict, just use the grouper recipe (or zip(*[iter(groups)]*2)) to group into pairs, then use a dict comprehension to map either k, v in those pairs to next(k): list(v).

On the other hand, if you really do want them together, it's the same steps, but with a list of [next(k)] + list(v)] at the end.

However, if you don't actually understand groupby well enough to turn that description into code, you should probably write something you do understand. And that's not too hard:

def groupify(lines):
    event = []
    for line in lines:
        if 'event' in line:
            if event: yield event
            event = [line]
        else:
            event.append(line)
    if event: yield event

Yes, it's 7 lines (condensable to 4 with some tricks) instead of 3 (condensable to 1 by nesting comprehensions in an ugly way), but 7 lines you understand and can debug are more useful than 3 lines of magic.

When you iterate the generator created by this function, it gives you lists of lines, like this:

>>> for event in groupify(lines):
...     print(event)

This will print:

['some event', 'reading: 25.4', 'reading: 23.4', 'reading: 25.1']
['different event', 'reading: 22.3', 'reading: 21.1', 'reading: 26.0', 'reading: 25.2']
['another event', 'reading: 25.5', 'reading: 25.1']

If you want a list instead of an generator (so you can index it, or iterate over it twice), you can do the same thing you do to turn any other iterable into a list:

events = list(groupify(lines))
like image 33
abarnert Avatar answered Sep 30 '22 17:09

abarnert