I would like to split a string by ':' and ' ' characters. However, i would like to ignore two spaces ' ' and two colons '::'. for e.g.
text = "s:11011 i:11010 ::110011 :110010 d:11000"
should split into
[s,11011,i,11010,:,110011, ,110010,d,11000]
after following the Regular Expressions HOWTO on the python website, i managed to comeup with the following
regx= re.compile('([\s:]|[^\s\s]|[^::])')
regx.split(text)
However this does not work as intended as it splits on the : and spaces, but it still includes the ':' and ' ' in the split.
[s,:,11011, ,i,:,11010, ,:,:,110011, , :,110010, ,d,:,11000]
How can I fix this?
EDIT: In case of a double space, i only want one space to appear
Note this assumes that your data has format like X:101010
:
>>> re.findall(r'(.+?):(.+?)\b ?',text)
[('s', '11011'), ('i', '11010'), (':', '110011'), (' ', '110010'), ('d', '11000')]
Then chain
them up:
>>> list(itertools.chain(*_))
['s', '11011', 'i', '11010', ':', '110011', ' ', '110010', 'd', '11000']
>>> text = "s:11011 i:11010 ::110011 :110010 d:11000"
>>> [x for x in re.split(r":(:)?|\s(\s)?", text) if x]
['s', '11011', 'i', '11010', ':', '110011', ' ', '110010', 'd', '11000']
Use the regex (?<=\d) |:(?=\d)
to split:
>>> text = "s:11011 i:11010 ::110011 :110010 d:11000"
>>> result = re.split(r"(?<=\d) |:(?=\d)", text)
>>> result
['s', '11011', 'i', '11010', ':', '110011', ' ', '110010', 'd', '11000']
This will split on:
(?<=\d)
a space, when there is a digit on the left. To check this I use a lookbehind assertion.
:(?=\d)
a colon, when there is a digit on the right. To check this I use a lookahead assertion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With