Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

converting a string to a tree structure in python

Tags:

python

I have a string in python of the following form:

line a
line b
  line ba
  line bb
    line bba
  line bc
line c
  line ca
    line caa
line d

You can get the idea. It actually takes a very similar form to python code itself, in that there is a line, and below that line, indentations indicate part of a block, headed by the most recent line of a lesser indentation.

What I need to do is parse this code into a tree sructure, such that each root level line is the key of a dictionary, and its value is a dictionary representing all sublines. so the above would be:

{
'line a' => {},
'line b' => {
  'line ba' => {},
  'line bb' => {
    'line bba' => {}
    },
  'line bc' => {}
  },
'line c' => {
  'line ca' => {
    'line caa' => {}
    },
  },
'line d' => {}
}

here's what I've got:

def parse_message_to_tree(message):
    buf = StringIO(message)
    return parse_message_to_tree_helper(buf, 0)

def parse_message_to_tree_helper(buf, prev):
    ret = {}
    for line in buf:
        line = line.rstrip()
        index = len(line) - len(line.lstrip())
        print (line + " => " + str(index))
        if index > prev:
            ret[line.strip()] = parse_message_to_tree_helper(buf, index)
        else:
            ret[line.strip()] = {}

    return ret

The print shows lines that are left stripped and indexes of 0. I didn't think lstrip() was a mutator, but either way the index should still be accurate.

Any suggestions are helpful.

EDIT: Not sure what was going wrong before, but I tried again and it is closer to working, but still not quite right. Here's what I have now:

{'line a': {},
 'line b': {},
 'line ba': {'line bb': {},
             'line bba': {'line bc': {},
                          'line c': {},
                          'line ca': {},
                          'line caa': {},
                          'line d': {}}}}
like image 433
ewok Avatar asked Aug 19 '15 16:08

ewok


2 Answers

Like already noted before str.lstrip() is not a mutator, The index is coming accurate in my system as well.

But the issue is that by the time you realize that the index for the line has increased, line is actually point to the increased index line , example , in first case, we note that index for line increases at line ba , so line points to line ba , and then in your if condition , you do -

ret[line.strip()] = parse_message_to_tree_helper(buf, index)

This is wrong, because you would be setting whatever is returned by parse_message_to_tree_helper() to line ba , not its actual parent.

Also, once you recurse inside the function, you do not come out unless the file has completely been read, but the level in which a certain line is stored in dictionary depends on it coming out of recursion when the indentation has decreased.

I am not sure, if there are any inbuilt libraries that will help you do this , but a code that I was able to come up with (based a lot on your code) -

def parse_message_to_tree(message):
    buf = StringIO(message)
    return parse_message_to_tree_helper(buf, 0, None)[0]

def parse_message_to_tree_helper(buf, prev, prevline):
    ret = {}
    index = -1
    for line in buf:
        line = line.rstrip()
        index = len(line) - len(line.lstrip())
        print (line + " => " + str(index))
        if index > prev:
            ret[prevline.strip()],prevline,index = parse_message_to_tree_helper(buf, index, line)
            if index < prev:
                return ret,prevline,index
            continue
        elif not prevline:
            ret[line.strip()] = {}
        else:
            ret[prevline.strip()] = {}
        if index < prev:
            return ret,line,index
        prevline = line
    if index == -1:
        ret[prevline.strip()] = {}
        return ret,None,index
    if prev == index:
        ret[prevline.strip()] = {}
    return ret,None,0

Example/Demo -

>>> print(s)
line a
line b
  line ba
  line bb
    line bba
  line bc
line c
  line ca
    line caa
>>> def parse_message_to_tree(message):
...     buf = StringIO(message)
...     return parse_message_to_tree_helper(buf, 0, None)[0]
...
>>> def parse_message_to_tree_helper(buf, prev, prevline):
...     ret = {}
...     index = -1
...     for line in buf:
...         line = line.rstrip()
...         index = len(line) - len(line.lstrip())
...         print (line + " => " + str(index))
...         if index > prev:
...             ret[prevline.strip()],prevline,index = parse_message_to_tree_helper(buf, index, line)
...             if index < prev:
...                 return ret,prevline,index
...             continue
...         elif not prevline:
...             ret[line.strip()] = {}
...         else:
...             ret[prevline.strip()] = {}
...         if index < prev:
...             return ret,line,index
...         prevline = line
...     if index == -1:
...         ret[prevline.strip()] = {}
...         return ret,None,index
...     if prev == index:
...         ret[prevline.strip()] = {}
...     return ret,None,0
...
>>> pprint.pprint(parse_message_to_tree(s))
line a => 0
line b => 0
  line ba => 2
  line bb => 2
    line bba => 4
  line bc => 2
line c => 0
  line ca => 2
    line caa => 4
{'line a': {},
 'line b': {'line ba': {}, 'line bb': {'line bba': {}}, 'line bc': {}},
 'line c': {'line ca': {'line caa': {}}}}
>>> s = """line a
... line b
...   line ba
...   line bb
...     line bba
...   line bc
... line c
...   line ca
...     line caa
... line d"""
>>> pprint.pprint(parse_message_to_tree(s))
line a => 0
line b => 0
  line ba => 2
  line bb => 2
    line bba => 4
  line bc => 2
line c => 0
  line ca => 2
    line caa => 4
line d => 0
{'line a': {},
 'line b': {'line ba': {}, 'line bb': {'line bba': {}}, 'line bc': {}},
 'line c': {'line ca': {'line caa': {}}},
 'line d': {}}

You would need to test the code for any more bugs or some missed cases.

like image 141
Anand S Kumar Avatar answered Sep 30 '22 11:09

Anand S Kumar


lstrip() isn't a mutator, see documentation:

string.lstrip(s[, chars])

Return a copy of the string with leading characters removed. If chars is omitted or None, whitespace characters are removed. If given and not None, chars must be a string; the characters in the string will be stripped from the beginning of the string this method is called on.

And your code seems to work with that sample text on my machine.

like image 43
gmoshkin Avatar answered Sep 30 '22 11:09

gmoshkin