I have a list:
L= ['v1_A', 'v1_B', 'v1_C', 'V2_A', 'V2_B', 'V2000_A']
and I want to split it into sublists so all values that contain V1 become one (sub?)list, all values that contain "V2", "V2000", etc.
The length and number of sublist can differ, but all are identified by the part before the underscore.
You could use numpy's array_split function e.g., np. array_split(np. array(data), 20) to split into 20 nearly equal size chunks. To make sure chunks are exactly equal in size use np.
If you want to group your strings by the initial value, you have two options:
Use itertools.groupby()
; this makes grouping easy provided your data is already sorted on that first value:
from itertools import groupby
grouped = [list(g) for k, g in groupby(L, lambda s: s.partition('_')[0])]
The lambda here provides groupby()
with the value to group on; it'll give you separate generators (assigned to g
in the above code) that will yield values where the group key doesn't vary. As the lambda produces the first part of each string, that means the input is grouped on your v1
, V2
, V2000
, etc. prefixes.
Use a dictionary to group items by the common prefix. Use this if your input is not sorted:
grouped = {}
for elem in L:
key = elem.partition('_')[0]
grouped.setdefault(key, []).append(elem)
grouped = grouped.values()
If you use Python 3, that last line would be grouped = list(grouped.values())
Both produce a nested list for each prefix, grouping all values by that prefix. Both use str.partition()
to split off just the part before the first _
underscore.
Demo:
>>> from itertools import groupby
>>> L= ['v1_A', 'v1_B', 'v1_C', 'V2_A', 'V2_B', 'V2000_A']
>>> [list(g) for k, g in groupby(L, lambda s: s.partition('_')[0])]
[['v1_A', 'v1_B', 'v1_C'], ['V2_A', 'V2_B'], ['V2000_A']]
>>> grouped = {}
>>> for elem in L:
... key = elem.partition('_')[0]
... grouped.setdefault(key, []).append(elem)
...
>>> grouped.values()
[['V2_A', 'V2_B'], ['V2000_A'], ['v1_A', 'v1_B', 'v1_C']]
L= ['v1_A', 'v1_B', 'v1_C', 'V2_A', 'V2_B', 'V2000_A']
new_L = []
for i in L:
new_item = i.split('_')
new_L.append(new_item)
print new_L
Output: [['v1', 'A'], ['v1', 'B'], ['v1', 'C'], ['V2', 'A'], ['V2', 'B'], ['V2000', 'A']]
Hope this gives you the desired result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With