I have a string which looks like this:
'(a (b (c d e f)) g)'
I want to turn it into such a nested list:
['a', ['b', ['c', 'd', 'e', 'f']], 'g']
I used this function:
def tree_to_list(text, left=r'[(]', right=r'[)]', sep=r','):
pat = r'({}|{}|{})'.format(left, right, sep)
tokens = re.split(pat, text)
stack = [[]]
for x in tokens:
if not x or re.match(sep, x): continue
if re.match(left, x):
stack[-1].append([])
stack.append(stack[-1][-1])
elif re.match(right, x):
stack.pop()
if not stack:
raise ValueError('error: opening bracket is missing')
else:
stack[-1].append(x)
if len(stack) > 1:
print(stack)
raise ValueError('error: closing bracket is missing')
return stack.pop()
But result is not what i expected. There are no commas among strings:
['a', ['b', ['c' 'd' 'e' 'f']], 'g']
Could you please help me with that
You can use recursion with a generator:
import re
data = '(a (b (c d e f)) g)'
def group(d):
a = next(d, ')')
if a != ')':
yield list(group(d)) if a == '(' else a
yield from group(d)
print(next(group(iter(re.findall(r'\w+|[()]', data)))))
Output:
['a', ['b', ['c', 'd', 'e', 'f']], 'g']
Using string replacements to turn the input into the string with the desired Python value, and literal_eval to turn it into the value itself:
>>> import ast, re
>>> data = '(a (b (c d e f)) g)'
>>> s = re.sub(r'(\w+)', r'"\1"', data) # quote words
>>> s = re.sub(r'\s+', ',', s) # whitespace to comma
>>> s = s.replace('(', '[').replace(')', ']') # () -> []
>>> ast.literal_eval(s)
['a', ['b', ['c', 'd', 'e', 'f']], 'g']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With