Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group similar items in a master list and create new lists based on grouped items

Tags:

python

list

I am trying to create several new lists from one master list whereby the new lists contain similar items from the master list. Specifically, I have a list of bus routes. Here is a sample data set:

[u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line', u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line']

Most bus routes have an inbound (IB) and an outbound (OB) item, (and some have multiple IBs and OBs, and some have only one route, b/c they are loop routes). Eventually, I want to merge the IB and OB routes in mapping software (which I already know how to do)...

I originally created the filenames so that the first 5 characters represent the bus route, whether or not it's IB or OB. Therefore, I am able to group similar items based on the first 5 characters. For example, when I write:

for route in routes:
    print route[0:5]

I get:

>>> 
Bus04
Bus04
Bus15
Bus15

How can I "group" the files that pertain to Bus04 and Bus04, and Bus15 and Bus15 into new lists, such that I get:

[u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line'] and [u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line'] as separate lists?

I am thinking something along the lines of looping through each item, looking at the first five characters of each, then either create a new list with each new five character item that comes up (and add that item to the new list) or checking whether a list already exists and appending the similar item to it.

I'm having a hard time writing this out in code, so any help is greatly appreciated!

like image 865
Kristen G. Avatar asked Dec 16 '22 16:12

Kristen G.


2 Answers

I would use collections.defaultdict for this:

import collections

L = [u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line', u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line']
d = collections.defaultdict(list)
for elem in L:
    d[elem.split('_')[0]].append(elem)
print(dict(d))

This produces:

{u'Bus04': [u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line'],
 u'Bus15': [u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line']}

Unlike some of the other solutions proposed thus far, this works irrespective of the order in which entries appear in the input list.

like image 75
NPE Avatar answered Dec 21 '22 11:12

NPE


You can use itertools.groupby with a custom key function such as lambda x: x[0:5].

Here's a demo that gives you a static list (i.e. not just generators):

>>> import itertools
>>> lst = [u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line', u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line']
>>> [(key, list(val)) for key, val in itertools.groupby(lst, lambda x: x[0:5])]
Out[9]:
[(u'Bus04', [u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line']),
 (u'Bus15', [u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line'])]
like image 21
ThiefMaster Avatar answered Dec 21 '22 11:12

ThiefMaster