Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert string to tree representation with rules

I'm to do some simple RTF text parsing, I need to correct an iss. Given the following string:

{aaaaaaa\}aaaa\{aaaaa{bbbbbbbb{ccccc\{cccc}bbb{eeeee}{{gggg}ffff}bbbbbb}aaaaa}

Where:

\ means ignore next character
{ means expand
} means collapse up to parent

At any point in the string the state might be affected by any previous character except for characters in closed tags. eg {gggg} won't affect ffff but aaaaaaa}aaa.. will affect bbbb, ccc, eee, ggg, fff and so on.

From this we can split the above to just the meaningful blocks

A1 = aaaaaaa\}aaaa\{aaaaa
B1 = bbbbbbbb
C = ccccc\{cccc
B2 = bbb
E = eeeee
G = gggg
F = ffff
B3 = bbbbbb
A2 = aaaaa

Yielding:

{A1{B1{C}B2{E}{{G}F}B3}A2}

To describe the dependency I used X > Y means Y depends on X (as in X may change the meaning of Y)

A1
A1 > B1
A1 > B1 > C
A1 > B1 > B2
A1 > B1 > B2 > E
A1 > B1 > B2 > G
A1 > B1 > B2 > F
A1 > B1 > B2 > B3
A1 > B1 > B2 > A2
A1 > A2

So if we then have a node that can have a value and a ordered list of sub values. Such that the value tree would look like this:

A1
- B1
- - C
- - B2
- - - E
- - - G
- - - F
- - - B3
- A2

Then to get the characters that affect any node, I can just step up through each parent recursively.

What I keep getting stuck on is trying to parse the string into my node class:

public class myNode
{
    public myNode Parent;
    public string Value;
    public List<myNode> subNodes;
}

I read the string character by character, when I encounter a \ I increment by two. When I encounter a { I save the previous text section as the node value and step into the child, and when I encounter a } I step down.

But I keep messing up the logic, especially for G and A2. It's simple to do on paper but when I then try having to do the actual logic for step down I keep messing it up.

Is there a more straight forward way to make this structure? (or is there a better structure I should be using). I would think that there should be some library that allows conversions of strings to trees but I can't seem to find any.

like image 355
Seph Avatar asked May 04 '12 08:05

Seph


People also ask

How do you make a string tree?

Construct a binary tree from a string consisting of parenthesis and integers. The whole input represents a binary tree. It contains an integer followed by zero, one or two pairs of parenthesis. The integer represents the root's value and a pair of parenthesis contains a child binary tree with the same structure.

How do you represent a tree in a linked list?

Linked List Representation of Binary Tree. We use a double linked list to represent a binary tree. In a double linked list, every node consists of three fields. First field for storing left child address, second for storing actual data and third for storing right child address.

Can BST be used for strings?

As the name suggests, the most frequent operation in a BST with strings is searching for a specific string. Starting from the root we follow a downward path until we find the requested string.

How do I convert a string to an int in C++?

One effective way to convert a string object into a numeral int is to use the stoi() function. This method is commonly used for newer versions of C++, with is being introduced with C++11. It takes as input a string value and returns as output the integer version of it.


1 Answers

Use a "state machine" approach, where the state is the current node, and an escape flag:

string rtf = @"{aaaaaaa\}aaaa\{aaaaa{bbbbbbbb{ccccc\{cccc}bbb{eeeee}{{gggg}ffff}bbbbbb}aaaaa}";

Node root = new Node { Parent = null, Value = "root", SubNodes = new List<Node>() };
Node node = root;
bool escape = false;
foreach (char c in rtf) {
  if (escape) {
    node.Value += c;
    escape = false;
  } else {
    switch (c) {
      case '{':
        node = new Node { Parent = node, Value = String.Empty, SubNodes = new List<Node>() };
        node.Parent.SubNodes.Add(node);
        break;
      case '}':
        node = new Node { Parent = node.Parent.Parent, Value = String.Empty, SubNodes = new List<Node>() };
        if (node.Parent != null) node.Parent.SubNodes.Add(node);
        break;
      case '\\':
        escape = true;
        break;
      default:
        node.Value += c;
        break;
    }
  }
}

PrintNode(root, String.Empty);

The Node class (just renamed a little):

public class Node {
  public Node Parent;
  public string Value;
  public List<Node> SubNodes;
}

For display:

private static void PrintNode(Node node, string level) {
  if (node.Value.Length > 0) Console.WriteLine(level + node.Value);
  foreach (Node n in node.SubNodes) {
    PrintNode(n, level + "  ");
  }
}

Output:

root
  aaaaaaa}aaaa{aaaaa
    bbbbbbbb
      ccccc{cccc
    bbb
      eeeee
        gggg
      ffff
    bbbbbb
  aaaaa

Note that the G node is not a child of the E node, but a child of a node with an empty value.

Then of course you also have to add some error handling.

like image 110
Guffa Avatar answered Sep 28 '22 20:09

Guffa