Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Regex Split Keeps Split Pattern Characters

Tags:

python

regex

Easiest way to explain this is an example: I have this string: 'Docs/src/Scripts/temp' Which I know how to split two different ways:

re.split('/', 'Docs/src/Scripts/temp') -> ['Docs', 'src', 'Scripts', 'temp']

re.split('(/)', 'Docs/src/Scripts/temp') -> ['Docs', '/', 'src', '/', 'Scripts', '/', 'temp']

Is there a way to split by the forward slash, but keep the slash part of the words? For example, I want the above string to look like this:

['Docs/', '/src/', '/Scripts/', '/temp']

Any help would be appreciated!

like image 622
user1274774 Avatar asked Mar 16 '12 19:03

user1274774


1 Answers

Interesting question, I would suggest doing something like this:

>>> 'Docs/src/Scripts/temp'.replace('/', '/\x00/').split('\x00')
['Docs/', '/src/', '/Scripts/', '/temp']

The idea here is to first replace all / characters by two / characters separated by a special character that would not be a part of the original string. I used a null byte ('\x00'), but you could change this to something else, then finally split on that special character.

Regex isn't actually great here because you cannot split on zero-length matches, and re.findall() does not find overlapping matches, so you would potentially need to do several passes over the string.

Also, re.split('/', s) will do the same thing as s.split('/'), but the second is more efficient.

like image 92
Andrew Clark Avatar answered Sep 29 '22 04:09

Andrew Clark