I'm to do some simple RTF text parsing, I need to correct an iss. Given the following string:
{aaaaaaa\}aaaa\{aaaaa{bbbbbbbb{ccccc\{cccc}bbb{eeeee}{{gggg}ffff}bbbbbb}aaaaa}
Where:
\ means ignore next character
{ means expand
} means collapse up to parent
At any point in the string the state might be affected by any previous character except for characters in closed tags. eg {gggg} won't affect ffff but aaaaaaa}aaa.. will affect bbbb, ccc, eee, ggg, fff
and so on.
From this we can split the above to just the meaningful blocks
A1 = aaaaaaa\}aaaa\{aaaaa
B1 = bbbbbbbb
C = ccccc\{cccc
B2 = bbb
E = eeeee
G = gggg
F = ffff
B3 = bbbbbb
A2 = aaaaa
Yielding:
{A1{B1{C}B2{E}{{G}F}B3}A2}
To describe the dependency I used X > Y means Y depends on X (as in X may change the meaning of Y)
A1
A1 > B1
A1 > B1 > C
A1 > B1 > B2
A1 > B1 > B2 > E
A1 > B1 > B2 > G
A1 > B1 > B2 > F
A1 > B1 > B2 > B3
A1 > B1 > B2 > A2
A1 > A2
So if we then have a node that can have a value and a ordered list of sub values. Such that the value tree would look like this:
A1
- B1
- - C
- - B2
- - - E
- - - G
- - - F
- - - B3
- A2
Then to get the characters that affect any node, I can just step up through each parent recursively.
What I keep getting stuck on is trying to parse the string into my node class:
public class myNode
{
public myNode Parent;
public string Value;
public List<myNode> subNodes;
}
I read the string character by character, when I encounter a \
I increment by two. When I encounter a {
I save the previous text section as the node value and step into the child, and when I encounter a }
I step down.
But I keep messing up the logic, especially for G
and A2
. It's simple to do on paper but when I then try having to do the actual logic for step down I keep messing it up.
Is there a more straight forward way to make this structure? (or is there a better structure I should be using). I would think that there should be some library that allows conversions of strings to trees but I can't seem to find any.
Construct a binary tree from a string consisting of parenthesis and integers. The whole input represents a binary tree. It contains an integer followed by zero, one or two pairs of parenthesis. The integer represents the root's value and a pair of parenthesis contains a child binary tree with the same structure.
Linked List Representation of Binary Tree. We use a double linked list to represent a binary tree. In a double linked list, every node consists of three fields. First field for storing left child address, second for storing actual data and third for storing right child address.
As the name suggests, the most frequent operation in a BST with strings is searching for a specific string. Starting from the root we follow a downward path until we find the requested string.
One effective way to convert a string object into a numeral int is to use the stoi() function. This method is commonly used for newer versions of C++, with is being introduced with C++11. It takes as input a string value and returns as output the integer version of it.
Use a "state machine" approach, where the state is the current node, and an escape flag:
string rtf = @"{aaaaaaa\}aaaa\{aaaaa{bbbbbbbb{ccccc\{cccc}bbb{eeeee}{{gggg}ffff}bbbbbb}aaaaa}";
Node root = new Node { Parent = null, Value = "root", SubNodes = new List<Node>() };
Node node = root;
bool escape = false;
foreach (char c in rtf) {
if (escape) {
node.Value += c;
escape = false;
} else {
switch (c) {
case '{':
node = new Node { Parent = node, Value = String.Empty, SubNodes = new List<Node>() };
node.Parent.SubNodes.Add(node);
break;
case '}':
node = new Node { Parent = node.Parent.Parent, Value = String.Empty, SubNodes = new List<Node>() };
if (node.Parent != null) node.Parent.SubNodes.Add(node);
break;
case '\\':
escape = true;
break;
default:
node.Value += c;
break;
}
}
}
PrintNode(root, String.Empty);
The Node class (just renamed a little):
public class Node {
public Node Parent;
public string Value;
public List<Node> SubNodes;
}
For display:
private static void PrintNode(Node node, string level) {
if (node.Value.Length > 0) Console.WriteLine(level + node.Value);
foreach (Node n in node.SubNodes) {
PrintNode(n, level + " ");
}
}
Output:
root
aaaaaaa}aaaa{aaaaa
bbbbbbbb
ccccc{cccc
bbb
eeeee
gggg
ffff
bbbbbb
aaaaa
Note that the G node is not a child of the E node, but a child of a node with an empty value.
Then of course you also have to add some error handling.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With