Parsing nested indented text into lists
Hi,
maybe someone can give me a start help.
I have nested indented txt similar to this. I should parse that into a nested list structure like
TXT = r"""
Test1
NeedHelp
GotStuck
Sometime
NoLuck
NeedHelp2
StillStuck
GoodLuck
"""
Nested_Lists = ['Test1',
['NeedHelp',
['GotStuck',
['Sometime',
'NoLuck']]],
['NeedHelp2',
['StillStuck',
'GoodLuck']]
]
Nested_Lists = ['Test1', ['NeedHelp', ['GotStuck', ['Sometime', 'NoLuck']]], ['NeedHelp2', ['StillStuck', 'GoodLuck']]]
Any help for python3 would be appriciated
You could exploit Python tokenizer to parse the indented text:
from tokenize import NAME, INDENT, DEDENT, tokenize
def parse(file):
stack = [[]]
lastindent = len(stack)
def push_new_list():
stack[-1].append([])
stack.append(stack[-1][-1])
return len(stack)
for t in tokenize(file.readline):
if t.type == NAME:
if lastindent != len(stack):
stack.pop()
lastindent = push_new_list()
stack[-1].append(t.string) # add to current list
elif t.type == INDENT:
lastindent = push_new_list()
elif t.type == DEDENT:
stack.pop()
return stack[-1]
Example:
from io import BytesIO
from pprint import pprint
pprint(parse(BytesIO(TXT.encode('utf-8'))), width=20)
['Test1',
['NeedHelp',
['GotStuck',
['Sometime',
'NoLuck']]],
['NeedHelp2',
['StillStuck',
'GoodLuck']]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With