Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python split a string using regex

I would like to split a string by ':' and ' ' characters. However, i would like to ignore two spaces ' ' and two colons '::'. for e.g.

text = "s:11011 i:11010 ::110011  :110010 d:11000"

should split into

[s,11011,i,11010,:,110011, ,110010,d,11000]

after following the Regular Expressions HOWTO on the python website, i managed to comeup with the following

regx= re.compile('([\s:]|[^\s\s]|[^::])')
regx.split(text)

However this does not work as intended as it splits on the : and spaces, but it still includes the ':' and ' ' in the split.

[s,:,11011, ,i,:,11010, ,:,:,110011, , :,110010, ,d,:,11000]

How can I fix this?

EDIT: In case of a double space, i only want one space to appear

like image 774
misterMan Avatar asked May 02 '13 05:05

misterMan


3 Answers

Note this assumes that your data has format like X:101010:

>>> re.findall(r'(.+?):(.+?)\b ?',text)
[('s', '11011'), ('i', '11010'), (':', '110011'), (' ', '110010'), ('d', '11000')]

Then chain them up:

>>> list(itertools.chain(*_))
['s', '11011', 'i', '11010', ':', '110011', ' ', '110010', 'd', '11000']
like image 159
Kabie Avatar answered Oct 05 '22 15:10

Kabie


>>> text = "s:11011 i:11010 ::110011  :110010 d:11000"
>>> [x for x in re.split(r":(:)?|\s(\s)?", text) if x]
['s', '11011', 'i', '11010', ':', '110011', ' ', '110010', 'd', '11000']
like image 27
Nolen Royalty Avatar answered Oct 05 '22 15:10

Nolen Royalty


Use the regex (?<=\d) |:(?=\d) to split:

>>> text = "s:11011 i:11010 ::110011  :110010 d:11000"
>>> result = re.split(r"(?<=\d) |:(?=\d)", text)
>>> result
['s', '11011', 'i', '11010', ':', '110011', ' ', '110010', 'd', '11000']

This will split on:

(?<=\d) a space, when there is a digit on the left. To check this I use a lookbehind assertion.

:(?=\d) a colon, when there is a digit on the right. To check this I use a lookahead assertion.

like image 26
stema Avatar answered Oct 05 '22 14:10

stema