Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Non-consuming regular expression split in Python

Tags:

python

regex

How can a string be split on a separator expression while leaving that separator on the preceding string?

>>> text = "This is an example. Is it made up of more than once sentence? Yes, it is."
>>> re.split("[\.\?!] ", text)
['This is an example', 'Is it made up of more than one sentence', 'Yes, it is.']

I would like the result to be.

['This is an example.', 'Is it made up of more than one sentence?', 'Yes, it is.']

So far I have only tried a lookahead assertion but this fails to split at all.

like image 968
Rob Young Avatar asked Apr 27 '11 09:04

Rob Young


People also ask

How do you split a regular expression in Python?

split() and rsplit() split only when sep matches completely. If you want to split a string that matches a regular expression (regex) instead of perfect match, use the split() of the re module. In re. split() , specify the regex pattern in the first parameter and the target character string in the second parameter.

Does split regex?

To split a string by a regular expression, pass a regex as a parameter to the split() method, e.g. str. split(/[,. \s]/) . The split method takes a string or regular expression and splits the string based on the provided separator, into an array of substrings.

What is regualr expression Python?

A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.


2 Answers

>>> re.split("(?<=[\.\?!]) ", text)
['This is an example.', 'Is it made up of more than once sentence?', 'Yes, it is.']

The crucial thing is the use of a look-behind assertion with ?<=.

like image 58
Ignacio Vazquez-Abrams Avatar answered Oct 19 '22 23:10

Ignacio Vazquez-Abrams


import re

text = "This is an example.A particular case.Made up of more "\
       "than once sentence?Yes, it is.But no blank !!!That's"\
       " a problem ????Yes.I think so! :)"


for x in re.split("(?<=[\.\?!]) ", text):
    print repr(x)

print '\n'

for x in re.findall("[^.?!]*[.?!]|[^.?!]+(?=\Z)",text):
    print repr(x)

result

"This is an example.A particular case.Made up of more than once sentence?Yes, it is.But no blank !!!That'sa problem ????Yes.I think so!"
':)'


'This is an example.'
'A particular case.'
'Made up of more than once sentence?'
'Yes, it is.'
'But no blank !'
'!'
'!'
"That's a problem ?"
'?'
'?'
'?'
'Yes.'
'I think so!'
' :)'

.

EDIT

Also

import re

text = "! This is an example.A particular case.Made up of more "\
       "than once sentence?Yes, it is.But no blank !!!That's"\
       " a problem ????Yes.I think so! :)"

res = re.split('([.?!])',text)

print [ ''.join(res[i:i+2]) for i in xrange(0,len(res),2) ]

gives

['!', ' This is an example.', 'A particular case.', 'Made up of more than once sentence?', 'Yes, it is.', 'But no blank !', '!', '!', "That's a problem ?", '?', '?', '?', 'Yes.', 'I think so!', ' :)']
like image 36
eyquem Avatar answered Oct 19 '22 23:10

eyquem