Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: split string and get position

I want to split a string into pieces and want to get additional the (starting) positions of the splitted pieces of the string.

I can do this with the following code:

str_ = '  d     A7    g7'
flag_non_space_string_started = False
positions = []
for i, letter in enumerate(str_):
    if letter is not ' ':
        if not flag_non_space_string_started:
            positions.append(i)
            flag_non_space_string_started = True
    else:
        flag_non_space_string_started = False
# this is what I want
print(str_.split())
print(positions)
# prints:
# ['d', 'A7', 'g7']
# [2, 8, 14]

Is there a shorter (more pythonic) way to get the positions?

like image 507
Tillmann Walther Avatar asked Mar 03 '15 10:03

Tillmann Walther


1 Answers

You can use itertools.groupby with enumerate here. Here we are grouping the item at white spaces using not str.isspace, so k will be True for non-whitespace characters and False for white spaces, hence the if k condition. Now as each group is an iterator we need to call next() on it to get the starting index as well as the first character. Now to get the rest of the group items use a list-comprehension and pass it to str.join to get a string. Don't forget to prepend the item we popped earlier to this string:

from itertools import groupby

str_ = '  d     A7    g7'

for k, g in groupby(enumerate(str_), lambda x: not x[1].isspace()):
    if k:
        pos, first_item = next(g)
        print pos, first_item + ''.join([x for _, x in g])

Output:

2 d
8 A7
14 g7

If the above solution seems complicated, then one can also use re.finditer. The match objects returned by re.finditer have methods like .start() and group(), they correspond to the start index of the matched group and the group itself respectively.

import re

str_ = '  d     A7    g7'

for m in re.finditer(r'\S+', str_):
    index, item = m.start(), m.group()
    # now do something with index, item
like image 80
Ashwini Chaudhary Avatar answered Sep 26 '22 14:09

Ashwini Chaudhary