I have an indented text file that will be used to build a tree. Each line represents a node, and indents represent depth as well as node the current node is a child of.
For example, a file might look like
ROOT Node1 Node2 Node3 Node4 Node5 Node6
Which indicates that ROOT contains three children: 1, 5, and 6, Node1 has one child: 2, and Node2 has one child: 3, etc.
I have come up with a recursive algorithm and have programmed it and it works, but it's kind of ugly and especially treats the example above very crudely (when going from node 4 to node 5)
It uses "indent count" as the basis for recursion, so if the number of indents = current depth + 1, I would go one level deeper. But this means when I read a line with less indents, I have to come back up one level at a time, checking the depth each time.
Here is what I have
def _recurse_tree(node, parent, depth):
tabs = 0
while node:
tabs = node.count("\t")
if tabs == depth:
print "%s: %s" %(parent.strip(), node.strip())
elif tabs == depth + 1:
node = _recurse_tree(node, prev, depth+1)
tabs = node.count("\t")
#check if we have to surface some more
if tabs == depth:
print "%s: %s" %(parent.strip(), node.strip())
else:
return node
else:
return node
prev = node
node = inFile.readline().rstrip()
inFile = open("test.txt")
root = inFile.readline().rstrip()
node = inFile.readline().rstrip()
_recurse_tree(node, root, 1)
Right now I am just printing out the nodes to verify that the parent node is correct for each line, but maybe there is a cleaner way to do it? Especially the case in the elif block when I'm coming back from each recursion call.
If you don't insist on recursion, this works too:
from itertools import takewhile
is_tab = '\t'.__eq__
def build_tree(lines):
lines = iter(lines)
stack = []
for line in lines:
indent = len(list(takewhile(is_tab, line)))
stack[indent:] = [line.lstrip()]
print stack
source = '''ROOT
\tNode1
\t\tNode2
\t\t\tNode3
\t\t\t\tNode4
\tNode5
\tNode6'''
build_tree(source.split('\n'))
Result:
['ROOT']
['ROOT', 'Node1']
['ROOT', 'Node1', 'Node2']
['ROOT', 'Node1', 'Node2', 'Node3']
['ROOT', 'Node1', 'Node2', 'Node3', 'Node4']
['ROOT', 'Node5']
['ROOT', 'Node6']
The big issue is the "lookahead" that I think caused the ugliness in question. It can be shortened slightly:
def _recurse_tree(parent, depth, source):
last_line = source.readline().rstrip()
while last_line:
tabs = last_line.count('\t')
if tabs < depth:
break
node = last_line.strip()
if tabs >= depth:
if parent is not None:
print "%s: %s" %(parent, node)
last_line = _recurse_tree(node, tabs+1, source)
return last_line
inFile = open("test.txt")
_recurse_tree(None, 0, inFile)
Since we're talking recursion, I took pains to avoid any global variables (source
and last_line
). It would be more pythonic to make them members on some parser object.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With