I am trying to create several new lists from one master list whereby the new lists contain similar items from the master list. Specifically, I have a list of bus routes. Here is a sample data set:
[u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line', u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line']
Most bus routes have an inbound (IB) and an outbound (OB) item, (and some have multiple IBs and OBs, and some have only one route, b/c they are loop routes). Eventually, I want to merge the IB and OB routes in mapping software (which I already know how to do)...
I originally created the filenames so that the first 5 characters represent the bus route, whether or not it's IB or OB. Therefore, I am able to group similar items based on the first 5 characters. For example, when I write:
for route in routes:
print route[0:5]
I get:
>>>
Bus04
Bus04
Bus15
Bus15
How can I "group" the files that pertain to Bus04
and Bus04
, and Bus15
and Bus15
into new lists, such that I get:
[u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line']
and [u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line']
as separate lists?
I am thinking something along the lines of looping through each item, looking at the first five characters of each, then either create a new list with each new five character item that comes up (and add that item to the new list) or checking whether a list already exists and appending the similar item to it.
I'm having a hard time writing this out in code, so any help is greatly appreciated!
I would use collections.defaultdict
for this:
import collections
L = [u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line', u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line']
d = collections.defaultdict(list)
for elem in L:
d[elem.split('_')[0]].append(elem)
print(dict(d))
This produces:
{u'Bus04': [u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line'],
u'Bus15': [u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line']}
Unlike some of the other solutions proposed thus far, this works irrespective of the order in which entries appear in the input list.
You can use itertools.groupby
with a custom key function such as lambda x: x[0:5]
.
Here's a demo that gives you a static list (i.e. not just generators):
>>> import itertools
>>> lst = [u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line', u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line']
>>> [(key, list(val)) for key, val in itertools.groupby(lst, lambda x: x[0:5])]
Out[9]:
[(u'Bus04', [u'Bus04_00_00_IB_pts_Line', u'Bus04_00_00_OB_pts_Line']),
(u'Bus15', [u'Bus15_00_00_IB_pts_Line', u'Bus15_00_00_OB_pts_Line'])]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With