Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse string in python

Tags:

python

regex

I would like to turn this:

mystr = '  foo1   (foo2 foo3 (foo4))' 

into:

['foo1','foo2 foo3 (foo4)']

So basically I have to split based on a number of spaces/tabs and parenthesis.

I have seen that re package split function can handle several delimiters (Python: Split string with multiple delimiters) but I can not manage to understand the right approach to parse this kind of strings.

Which would be the best -most pythonic- and simple approach?

like image 357
M.E. Avatar asked Mar 09 '23 06:03

M.E.


2 Answers

As far as I can understand, this is consistent with what you want, and is pretty simple. It just uses some slicing to isolate the first word and the part between parentheses. It also has to use strip a couple of times due to the extra spaces. It may seem a little verbose, but to be honest if the task can be accomplished with such simple string operations I feel like complicated parsing is unnecessary (although I may have gotten it wrong). Note that this is flexible in the amount of whitespace to split by.

mystr = '  foo1   (foo2 foo3 (foo4))' 
mystr = mystr.strip()
i = mystr.index(' ')
a = mystr[:i].strip()
b = mystr[i:].strip()[1:-1]
print([a, b])

with output

['foo1', 'foo2 foo3 (foo4)']

Although I'm still not entirely clear if this is what you want. Let me know if it works or what needs changing.

like image 187
Izaak van Dongen Avatar answered Mar 19 '23 21:03

Izaak van Dongen


If the structure of your string is as rigidly defined as you say, you can use a regular expression to parse it pretty easily:

import re

mystr = '  foo1   (foo2 foo3 (foo4))'

pattern = r'(\S+)\s+\((.*)\)'
match = re.search(pattern, mystr)
results = match.groups() # ('foo1', 'foo2 foo3 (foo4)')

Be careful with this approach though if your real input is not as well defined as you have suggested your question. Regular expressions can only parse regular languages, and the way parentheses usually work is not "regular". In this question you only cared about handling a single set parentheses (the outermost) so a simple greedy match works. It might be hard or impossible to adapt this solution to other formats of input, even if they seem very similar!

like image 43
Blckknght Avatar answered Mar 19 '23 20:03

Blckknght