Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How can I include the delimiter(s) in a string split? [duplicate]

I would like to split a string, with multiple delimiters, but keep the delimiters in the resulting list. I think this is a useful thing to do an an initial step of parsing any kind of formula, and I suspect there is a nice Python solution.

Someone asked a similar question in Java here.

For example, a typical split looks like this:

>>> s='(twoplusthree)plusfour'
>>> s.split(f, 'plus')
['(two', 'three)', 'four']

But I'm looking for a nice way to add the plus back in (or retain it):

['(two', 'plus', 'three)', 'plus', 'four']

Ultimately I'd like to do this for each operator and bracket, so if there's a way to get

['(', 'two', 'plus', 'three', ')', 'plus', 'four']

all in one go, then all the better.

like image 260
Bill Avatar asked Jan 18 '14 18:01

Bill


People also ask

How do you split data by delimiter in Python?

Python String split() Method Syntax separator : This is a delimiter. The string splits at this specified separator. If is not provided then any white space is a separator. maxsplit : It is a number, which tells us to split the string into maximum of provided number of times.

How do you split a string with special characters in Python?

Use the re. split() method to split a string on all special characters. The re. split() method takes a pattern and a string and splits the string on each occurrence of the pattern.

How do you split with multiple separators?

Use the String. split() method to split a string with multiple separators, e.g. str. split(/[-_]+/) . The split method can be passed a regular expression containing multiple characters to split the string with multiple separators.


3 Answers

You can do that with Python's re module.

import re
s='(twoplusthree)plusfour'
list(filter(None, re.split(r"(plus|[()])", s)))

You can leave out the list if you only need an iterator.

like image 63
Cu3PO42 Avatar answered Nov 05 '22 04:11

Cu3PO42


import re
s = '(twoplusthree)plusfour'
l = re.split(r"(plus|\(|\))", s)
a = [x for x in l if x != '']
print a

output:

['(', 'two', 'plus', 'three', ')', 'plus', 'four']
like image 43
jgritty Avatar answered Nov 05 '22 05:11

jgritty


Here is an easy way using re.split:

import re

s = '(twoplusthree)plusfour'
re.split('(plus)',  s)

Output:

['(two', 'plus', 'three)', 'plus', 'four']

re.split is very similar to string.split except that instead of a literal delimiter you pass a regex pattern. The trick here is to put () around the pattern so it gets extracted as a group.

Bear in mind that you'll have empty strings if there are two consecutive occurrencies of the delimiter pattern

like image 26
Alexander Stefanov Avatar answered Nov 05 '22 05:11

Alexander Stefanov