Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tokenize a string keeping delimiters in Python

Is there any equivalent to str.split in Python that also returns the delimiters?

I need to preserve the whitespace layout for my output after processing some of the tokens.

Example:

>>> s="\tthis is an  example"
>>> print s.split()
['this', 'is', 'an', 'example']

>>> print what_I_want(s)
['\t', 'this', ' ', 'is', ' ', 'an', '  ', 'example']

Thanks!

like image 756
fortran Avatar asked Nov 30 '09 15:11

fortran


People also ask

How do you split special characters in a string in Python?

Use the re. split() method to split a string on all special characters. The re. split() method takes a pattern and a string and splits the string on each occurrence of the pattern.


2 Answers

How about

import re
splitter = re.compile(r'(\s+|\S+)')
splitter.findall(s)
like image 176
Jonathan Feinberg Avatar answered Nov 02 '22 19:11

Jonathan Feinberg


>>> re.compile(r'(\s+)').split("\tthis is an  example")
['', '\t', 'this', ' ', 'is', ' ', 'an', '  ', 'example']
like image 35
Denis Otkidach Avatar answered Nov 02 '22 21:11

Denis Otkidach