Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Grouping into timeslots (minutes) for days of data

I have a list of events that occur at mS accurate intervals, that spans a few days. I want to cluster all the events that occur in a 'per-n-minutes' slot (can be twenty events, can be no events). I have a datetime.datetime item for each event, so I can get datetime.datetime.minute without any trouble.

My list of events is sorted in time order, earliest first, latest last. The list is complete for the time period I am working on.

The idea being that I can change list:-

[[a],[b],[c],[d],[e],[f],[g],[h],[i]...]

where a, b, c, occur between mins 0 and 29, d,e,f,g occur between mins 30 and 59, nothing between 0 and 29 (next hour), h, i between 30 and 59 ...

into a new list:-

[[[a],[b],[c]],[[d],[e],[f],[g]],[],[[h],[i]]...]

I'm not sure how to build an iterator that loops through the two time slots until the time series list ends. Anything I can think of using xrange stops once it completes, so I wondered if there was a way of using `while' to do the slicing?

I also will be using a smaller timeslot, probably 5 mins, I used 30mins as a shorter example for demonstration.

(for context, I'm making a geo plotted time based view of the recent quakes in New Zealand. and want to show all the quakes that occurs in a small block of time in one step to speed up the replay)

like image 281
Jay Gattuso Avatar asked Jul 25 '13 07:07

Jay Gattuso


1 Answers

# create sample data
from datetime import datetime, timedelta
d = datetime.now()
data = [d + timedelta(minutes=i) for i in xrange(100)]

# prepare and group the data
from itertools import groupby

def get_key(d):
    # group by 30 minutes
    k = d + timedelta(minutes=-(d.minute % 30)) 
    return datetime(k.year, k.month, k.day, k.hour, k.minute, 0)

g = groupby(sorted(data), key=get_key)

# print data
for key, items in g:
    print key
    for item in items:
        print '-', item

This is a python translation of this answer, which works by rounding the datetime to the next boundary and use that for grouping.


If you really need the possible empty groups, you can just add them by using this or a similar method:

def add_missing_empty_frames(g):
    last_key = None
    for key, items in g:
        if last_key:
            while (key-last_key).seconds > 30*60:
                empty_key = last_key + timedelta(minutes=30)
                yield (empty_key, [])
                last_key = empty_key
        yield (key, items)
        last_key = key

for key, items in add_missing_empty_frames(g):
    ...
like image 197
sloth Avatar answered Nov 15 '22 09:11

sloth