Group similar items in a master list and create new lists based on grouped items

Question

I am trying to create several new lists from one master list whereby the new lists contain similar items from the master list. Specifically, I have a list of bus routes. Here is a sample data set:

[u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line', u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line']

Most bus routes have an inbound (IB) and an outbound (OB) item, (and some have multiple IBs and OBs, and some have only one route, b/c they are loop routes). Eventually, I want to merge the IB and OB routes in mapping software (which I already know how to do)...

I originally created the filenames so that the first 5 characters represent the bus route, whether or not it's IB or OB. Therefore, I am able to group similar items based on the first 5 characters. For example, when I write:

for route in routes:
    print route[0:5]

I get:

>>> 
Bus04
Bus04
Bus15
Bus15

How can I "group" the files that pertain to Bus04 and Bus04, and Bus15 and Bus15 into new lists, such that I get:

[u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line'] and [u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line'] as separate lists?

I am thinking something along the lines of looping through each item, looking at the first five characters of each, then either create a new list with each new five character item that comes up (and add that item to the new list) or checking whether a list already exists and appending the similar item to it.

I'm having a hard time writing this out in code, so any help is greatly appreciated!

NPE · Accepted Answer

I would use collections.defaultdict for this:

import collections

L = [u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line', u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line']
d = collections.defaultdict(list)
for elem in L:
    d[elem.split('_')[0]].append(elem)
print(dict(d))

This produces:

{u'Bus04': [u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line'],
 u'Bus15': [u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line']}

Unlike some of the other solutions proposed thus far, this works irrespective of the order in which entries appear in the input list.

ThiefMaster · Answer

You can use itertools.groupby with a custom key function such as lambda x: x[0:5].

Here's a demo that gives you a static list (i.e. not just generators):

>>> import itertools
>>> lst = [u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line', u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line']
>>> [(key, list(val)) for key, val in itertools.groupby(lst, lambda x: x[0:5])]
Out[9]:
[(u'Bus04', [u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line']),
 (u'Bus15', [u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line'])]

Group similar items in a master list and create new lists based on grouped items

Tags:

python

list

Kristen G.

2 Answers

NPE

ThiefMaster

Recent Activity

Donate For Us

Group similar items in a master list and create new lists based on grouped items

Tags:

python

list

Kristen G.

2 Answers

NPE

ThiefMaster

Related questions

Recent Activity

Donate For Us