I want to split a string into pieces and want to get additional the (starting) positions of the splitted pieces of the string.
I can do this with the following code:
str_ = ' d A7 g7'
flag_non_space_string_started = False
positions = []
for i, letter in enumerate(str_):
if letter is not ' ':
if not flag_non_space_string_started:
positions.append(i)
flag_non_space_string_started = True
else:
flag_non_space_string_started = False
# this is what I want
print(str_.split())
print(positions)
# prints:
# ['d', 'A7', 'g7']
# [2, 8, 14]
Is there a shorter (more pythonic) way to get the positions?
You can use itertools.groupby with enumerate here. Here we are grouping the item at white spaces using not str.isspace
, so k
will be True for non-whitespace characters and False for white spaces, hence the if k
condition. Now as each group is an iterator we need to call next() on it to get the starting index as well as the first character. Now to get the rest of the group items use a list-comprehension and pass it to str.join to get a string. Don't forget to prepend the item we popped earlier to this string:
from itertools import groupby
str_ = ' d A7 g7'
for k, g in groupby(enumerate(str_), lambda x: not x[1].isspace()):
if k:
pos, first_item = next(g)
print pos, first_item + ''.join([x for _, x in g])
Output:
2 d
8 A7
14 g7
If the above solution seems complicated, then one can also use re.finditer. The match objects returned by re.finditer have methods like .start() and group(), they correspond to the start index of the matched group and the group itself respectively.
import re
str_ = ' d A7 g7'
for m in re.finditer(r'\S+', str_):
index, item = m.start(), m.group()
# now do something with index, item
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With