Not sure how to title this question. I've run into a few situations where I have a list of data, maybe annotated with some property, and I want to collect them into groups.
For example, maybe I have a file like this:
some event
reading: 25.4
reading: 23.4
reading: 25.1
different event
reading: 22.3
reading: 21.1
reading: 26.0
reading: 25.2
another event
reading: 25.5
reading: 25.1
and I want to group each set of readings, splitting them on a condition (in this case, an event happening) so that I end up with a structure like
[['some event',
'reading: 25.4',
'reading: 23.4',
'reading: 25.1'],
['different event',
'reading: 22.3',
'reading: 21.1',
'reading: 26.0',
'reading: 25.2'],
['another event',
'reading: 25.5',
'reading: 25.1']]
In it's generic form, it is: Look for a condition, collect the data until that condition is true again, repeat
Right now, I'd do something like
events = []
current_event = []
for line in lines:
if is_event(line):
if current_event:
events.append(current_event)
current_event = [line]
else:
current_event.append(line)
else:
if current_event:
events.append(current_event)
def is_event(line):
return 'event' in line
which produces what I want, but it's ugly and hard to understand. I'm fairly certain there has to be a better way
My guess is that it involves some itertools wizardry, but I'm new to itertools and can't quite wrap my head around all of it.
Thanks!
I've actually gone with Steve Jessop's answer with a Grouper class. Here's what I'm doing:
class Grouper(object):
def __init__(self, condition_function):
self.count = 0
self.condition_function = condition_function
def __call__(self, line):
if self.condition_function(line):
self.count += 1
return self.count
and then using it like
event_grouper = Grouper(is_event)
result_as_iterators = (x[1] for x in itertools.groupby(lines, event_grouper))
and then to turn it into a dictionary I do
event_dictionary = [{event: readings} for event, *readings in result_as_iterators]
which gives
[
{'some event': ['reading: 25.4', 'reading: 23.4', 'reading: 25.1']},
{'different event': ['reading: 22.3','reading: 21.1','reading: 26.0','reading: 25.2']},
{'another event': ['reading: 25.5', 'reading: 25.1']}
]
I doubt itertools (or collections) can make it clearer than this, unless the exact pattern is implemented in there somewhere.
Two things I notice:
current_event[0]
So you can skip the checking for if you have a current event, and you don't have to special-case creating it either. Additionally, since the "current" event is always the last one, we can just use a negative index to jump straight to it:
events = []
for line in lines:
if is_event(line):
events.append([])
events[-1].append(line)
def is_event(line):
return 'event' in line
With itertools.groupby
, you can easily group things based on a key, like 'event' in line
. So, as a first step:
>>> for k, g in itertools.groupby(lines, lambda line: 'event' in line):
... print(k, list(g))
Of course this doesn't put the events together with their values. I suspect you really don't want the events together with their values, but would actually prefer to have a dict of event: [values]
or a list of (event, [values])
. In which case you're nearly done. For example, to get that dict, just use the grouper recipe (or zip(*[iter(groups)]*2)
) to group into pairs, then use a dict comprehension to map either k, v
in those pairs to next(k): list(v)
.
On the other hand, if you really do want them together, it's the same steps, but with a list of [next(k)] + list(v)]
at the end.
However, if you don't actually understand groupby
well enough to turn that description into code, you should probably write something you do understand. And that's not too hard:
def groupify(lines):
event = []
for line in lines:
if 'event' in line:
if event: yield event
event = [line]
else:
event.append(line)
if event: yield event
Yes, it's 7 lines (condensable to 4 with some tricks) instead of 3 (condensable to 1 by nesting comprehensions in an ugly way), but 7 lines you understand and can debug are more useful than 3 lines of magic.
When you iterate the generator created by this function, it gives you lists of lines, like this:
>>> for event in groupify(lines):
... print(event)
This will print:
['some event', 'reading: 25.4', 'reading: 23.4', 'reading: 25.1']
['different event', 'reading: 22.3', 'reading: 21.1', 'reading: 26.0', 'reading: 25.2']
['another event', 'reading: 25.5', 'reading: 25.1']
If you want a list instead of an generator (so you can index it, or iterate over it twice), you can do the same thing you do to turn any other iterable into a list:
events = list(groupify(lines))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With