If we have a list
of strings
in python and want to create sublists based on some special string
how should we do?
For instance:
l = ["data","more data","","data 2","more data 2","danger","","date3","lll"] p = split_special(l,"")
would generate:
p = [["data","more data"],["data 2","more data 2","danger"],["date3","lll"]]
To split the elements of a list in Python: Use a list comprehension to iterate over the list. On each iteration, call the split() method to split each string. Return the part of each string you want to keep.
The split() method of the string class is fairly straightforward. It splits the string, given a delimiter, and returns a list consisting of the elements split out from the string. By default, the delimiter is set to a whitespace - so if you omit the delimiter argument, your string will be split on each whitespace.
array_split() is a numpy method that splits a list into equal sized chunks. Here, the size of the chunk is 5.
itertools.groupby is one approach (as it often is):
>>> l = ["data","more data","","data 2","more data 2","danger","","date3","lll"] >>> from itertools import groupby >>> groupby(l, lambda x: x == "") <itertools.groupby object at 0x9ce06bc> >>> [list(group) for k, group in groupby(l, lambda x: x == "") if not k] [['data', 'more data'], ['data 2', 'more data 2', 'danger'], ['date3', 'lll']]
We can even cheat a little because of this particular case:
>>> [list(group) for k, group in groupby(l, bool) if k] [['data', 'more data'], ['data 2', 'more data 2', 'danger'], ['date3', 'lll']]
One possible implementation using itertools
>>> l ['data', 'more data', '', 'data 2', 'more data 2', 'danger', '', 'date3', 'lll'] >>> it_l = iter(l) >>> from itertools import takewhile, dropwhile >>> [[e] + list(takewhile(lambda e: e != "", it_l)) for e in it_l if e != ""] [['data', 'more data'], ['data 2', 'more data 2', 'danger'], ['date3', 'lll']]
Note*
This is as fast as using groupby
>>> stmt_dsm = """ [list(group) for k, group in groupby(l, lambda x: x == "") if not k] """ >>> stmt_ab = """ it_l = iter(l) [[e] + list(takewhile(lambda e: e != "", it_l)) for e in it_l if e != ""] """ >>> t_ab = timeit.Timer(stmt = stmt_ab, setup = "from __main__ import l, dropwhile, takewhile") >>> t_dsm = timeit.Timer(stmt = stmt_dsm, setup = "from __main__ import l, groupby") >>> t_ab.timeit(100000) 1.6863486541265047 >>> t_dsm.timeit(100000) 1.5298066765462863 >>> t_ab.timeit(100000) 1.735611326163962 >>>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With