Parse indented text tree in Java

Question

I have an indented file that I need to parsed using java, I need some way to place this in a Section class as shown below

    root
     root1
       text1
         text1.1
         text1.2
       text2
         text2.1
         text2.2

     root2
       text1
         text1.1
         text1.2
       text2
         text2.1
         text2.2.2

I have the class for putting the indented stuff it looks like

public class Section 
{

    private List<Section> children;
    private String text;
    private int depth;
    public Section(String t)
    {
       text =t;
    }

    public List<Section> getChildren()
    {
        if (children == null)
      {
            children = new ArrayList<Section>();
       }
        return children;
}

public void setChildren(List<Section> newChildren)
{
    if (newChildren == null) {
        children = newChildren;
    } else {
        if (children == null) {
            children = new ArrayList<Section>();
        }
        for (Section child : newChildren) {
            this.addChild(child);
        }
    }
}

public void addChild(Section child)
{
    if (children == null) {
        children = new ArrayList<Section>();
    }
    if (child != null) {
        children.add(child);
    }
}

public String getText()
{
    return text;
}

public void setText(String newText)
{
    text =newText;
}
public String getDepth()
{
    return depth;
}

 public void setDepth(int newDepth)
 {
    depth = newDepth;
 }
}

I need some way to parse the file and place it in expected result which us a Section object which would look like below

Section= 

Text="Root"
Children
Child1: Text= "root1" 

        Child1: "text1"
            Child1="Text 1.1"
            Child2="Text 1.2"
        Child2: "text2"
            Child1="Text 2.1"
            Child2="Text 2.2"
            Children
Child2: Text= "root2" 
        Child1: "text1"
            Child1="Text 1.1"
            Child2="Text 1.2"
        Child2: "text2"
            Child1="Text 2.1"
            Child2="Text 2.2"


Here is some code that I have started
   int indentCount=0;
   while(String text = reader.readline()
   {
   indentCount=countLeadingSpaces(String word);
   //TODO create the section here
   }


public static int countLeadingSpaces(String word)
{
    int length=word.length();
    int count=0;

   for(int i=0;i<length;i++)
   {
       char first = word.charAt(i); 
        if(Character.isWhitespace(first))
        {
            count++;           
        }
        else
        {
            return count;
        }
   }

 return count;

}

Chthonic Project · Accepted Answer

I added a parent pointer as well. Maybe the text can be parsed without it, but parent pointers make it easier. First of all, you need to have more constructors:

static final int root_depth = 4; // assuming 4 whitespaces precede the tree root

public Section(String text, int depth) {
    this.text     = text;
    this.depth    = depth;
    this.children = new ArrayList<Section>();
    this.parent   = null;
}

public Section(String text, int depth, Section parent) {
    this.text     = text;
    this.depth    = depth;
    this.children = new ArrayList<Section>();
    this.parent   = parent;
}

Then, when you start parsing the file, read it line by line:

Section prev = null;
for (String line; (line = bufferedReader.readLine()) != null; ) {
    if (prev == null && line begins with root_depth whitespaces) {
        Section root = new Section(text_of_line, root_depth);
        prev = root;
    }
    else {
        int t_depth = no. of whitespaces at the beginning of this line;
        if (t_depth > prev.getDepth())
            // assuming that empty sections are not allowed
            Section t_section = new Section(text_of_line, t_depth, prev);
            prev.addChild(t_section);
        }
        else if (t_depth == prev.getDepth) {
            Section t_section = new Section(text_of_line, t_depth, prev.getParent());
            prev.getParent().addChild(t_section);
        }
        else {
            while (t_depth < prev.getDepth()) {
                prev = prev.getParent();
            }
            // at this point, (t_depth == prev.getDepth()) = true
            Section t_section = new Section(text_of_line, t_depth, prev.getParent());
            prev.getParent().addChild(t_section);
        }
    }
}

I have glossed over some finer points of the pseudo-code, but I think you get the overall idea of how to go about this parsing. Do remember to implement the methods addChild(), getDepth(), getParent(), etc.

Max Seo · Answer

Surprisingly complex problem... but here's a pseudocode

intialize a stack
push first line to stack
while (there are more lines to read) {
 S1 = top of stack // do not pop off yet
 S2 = read a line
 if depth of S1 < depth of S2 {
  add S2 as child of S1
  push S2 into stack
 }
 else {
  while (depth of S1 >= depth of S2 AND there are at least 2 elements in stack) {
   pop stack
   S1 = top of stack // do not pop
  }
  add S2 as child of S1
  push S2 into stack
 }
}
return bottom element of stack

where depth is the # leading whitespaces. You might have to either modify or wrap the Section class to store the depth of the line.

Uwe Keim · Answer

An implementation in C#

Based on this answer, I've created a C# solution.

It allows for multiple roots and assumes an input structure like:

Test
    A
    B
    C
        C1
        C2
    D
Something
    One
    Two
    Three

An example usage of the code would be:

var lines = new[]
{
    "Test",
    "	A",
    "	B",
    "	C",
    "		C1",
    "		C2",
    "	D",
    "Something",
    "	One",
    "	Two",
    "	Three"
};

var roots = IndentedTextToTreeParser.Parse(lines, 0, '	');

var dump = IndentedTextToTreeParser.Dump(roots);
Console.WriteLine(dump);

You can specify the root indention (zero by default) and also the idention character (a tab by default).

The full code:

namespace MyNamespace
{
    using System;
    using System.Collections.Generic;
    using System.Diagnostics;
    using System.Text;

    public static class IndentedTextToTreeParser
    {
        // https://stackoverflow.com/questions/21735468/parse-indented-text-tree-in-java

        public static List<IndentTreeNode> Parse(IEnumerable<string> lines, int rootDepth = 0, char indentChar = '	')
        {
            var roots = new List<IndentTreeNode>();

            // --

            IndentTreeNode prev = null;

            foreach (var line in lines)
            {
                if (string.IsNullOrEmpty(line.Trim(indentChar)))
                    throw new Exception(@"Empty lines are not allowed.");

                var currentDepth = countWhiteSpacesAtBeginningOfLine(line, indentChar);

                if (currentDepth == rootDepth)
                {
                    var root = new IndentTreeNode(line, rootDepth);
                    prev = root;

                    roots.Add(root);
                }
                else
                {
                    if (prev == null)
                        throw new Exception(@"Unexpected indention.");
                    if (currentDepth > prev.Depth + 1)
                        throw new Exception(@"Unexpected indention (children were skipped).");

                    if (currentDepth > prev.Depth)
                    {
                        var node = new IndentTreeNode(line.Trim(), currentDepth, prev);
                        prev.AddChild(node);

                        prev = node;
                    }
                    else if (currentDepth == prev.Depth)
                    {
                        var node = new IndentTreeNode(line.Trim(), currentDepth, prev.Parent);
                        prev.Parent.AddChild(node);

                        prev = node;
                    }
                    else
                    {
                        while (currentDepth < prev.Depth) prev = prev.Parent;

                        // at this point, (currentDepth == prev.Depth) = true
                        var node = new IndentTreeNode(line.Trim(indentChar), currentDepth, prev.Parent);
                        prev.Parent.AddChild(node);
                    }
                }
            }

            // --

            return roots;
        }

        public static string Dump(IEnumerable<IndentTreeNode> roots)
        {
            var sb = new StringBuilder();

            foreach (var root in roots)
            {
                doDump(root, sb, @"");
            }

            return sb.ToString();
        }

        private static int countWhiteSpacesAtBeginningOfLine(string line, char indentChar)
        {
            var lengthBefore = line.Length;
            var lengthAfter = line.TrimStart(indentChar).Length;
            return lengthBefore - lengthAfter;
        }

        private static void doDump(IndentTreeNode treeNode, StringBuilder sb, string indent)
        {
            sb.AppendLine(indent + treeNode.Text);
            foreach (var child in treeNode.Children)
            {
                doDump(child, sb, indent + @"    ");
            }
        }
    }

    [DebuggerDisplay(@"{Depth}: {Text} ({Children.Count} children)")]
    public class IndentTreeNode
    {
        public IndentTreeNode(string text, int depth = 0, IndentTreeNode parent = null)
        {
            Text = text;
            Depth = depth;
            Parent = parent;
        }

        public string Text { get; }
        public int Depth { get; }
        public IndentTreeNode Parent { get; }
        public List<IndentTreeNode> Children { get; } = new List<IndentTreeNode>();

        public void AddChild(IndentTreeNode child)
        {
            if (child != null) Children.Add(child);
        }
    }
}

I've also included an method Dump() to convert the tree back to a string in order to better debug the algorithm itself.

Parse indented text tree in Java

Tags:

java

algorithm

Helen Araya

3 Answers

Chthonic Project

Max Seo

An implementation in C#

Uwe Keim

Recent Activity

Donate For Us

Parse indented text tree in Java

Tags:

java

algorithm

Helen Araya

3 Answers

Chthonic Project

Max Seo

An implementation in C#

Uwe Keim

Related questions

Recent Activity

Donate For Us