Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split list into sublist based on part of value

Tags:

python

I have a list:

L= ['v1_A', 'v1_B', 'v1_C', 'V2_A', 'V2_B', 'V2000_A']

and I want to split it into sublists so all values that contain V1 become one (sub?)list, all values that contain "V2", "V2000", etc.

The length and number of sublist can differ, but all are identified by the part before the underscore.

like image 594
user3910001 Avatar asked Aug 05 '14 11:08

user3910001


People also ask

How do you split a list into equal Sublists in Python?

You could use numpy's array_split function e.g., np. array_split(np. array(data), 20) to split into 20 nearly equal size chunks. To make sure chunks are exactly equal in size use np.


2 Answers

If you want to group your strings by the initial value, you have two options:

  1. Use itertools.groupby(); this makes grouping easy provided your data is already sorted on that first value:

    from itertools import groupby
    
    grouped = [list(g) for k, g in groupby(L, lambda s: s.partition('_')[0])]
    

    The lambda here provides groupby() with the value to group on; it'll give you separate generators (assigned to g in the above code) that will yield values where the group key doesn't vary. As the lambda produces the first part of each string, that means the input is grouped on your v1, V2, V2000, etc. prefixes.

  2. Use a dictionary to group items by the common prefix. Use this if your input is not sorted:

    grouped = {}
    for elem in L:
        key = elem.partition('_')[0]
        grouped.setdefault(key, []).append(elem)
    grouped = grouped.values()
    

    If you use Python 3, that last line would be grouped = list(grouped.values())

Both produce a nested list for each prefix, grouping all values by that prefix. Both use str.partition() to split off just the part before the first _ underscore.

Demo:

>>> from itertools import groupby
>>> L= ['v1_A', 'v1_B', 'v1_C', 'V2_A', 'V2_B', 'V2000_A']
>>> [list(g) for k, g in groupby(L, lambda s: s.partition('_')[0])]
[['v1_A', 'v1_B', 'v1_C'], ['V2_A', 'V2_B'], ['V2000_A']]
>>> grouped = {}
>>> for elem in L:
...     key = elem.partition('_')[0]
...     grouped.setdefault(key, []).append(elem)
... 
>>> grouped.values()
[['V2_A', 'V2_B'], ['V2000_A'], ['v1_A', 'v1_B', 'v1_C']]
like image 153
Martijn Pieters Avatar answered Oct 20 '22 22:10

Martijn Pieters


L= ['v1_A', 'v1_B', 'v1_C', 'V2_A', 'V2_B', 'V2000_A']
new_L = []
for i in L:
    new_item = i.split('_')
    new_L.append(new_item)
print new_L

Output: [['v1', 'A'], ['v1', 'B'], ['v1', 'C'], ['V2', 'A'], ['V2', 'B'], ['V2000', 'A']]

Hope this gives you the desired result.

like image 30
Sesha Avatar answered Oct 20 '22 21:10

Sesha