I'm pretty much of a noob in regards to C programming.
Been trying for a few days to create a binary tree from expressions of the form:
A(B,C(D,$))
Where each letters are nodes.
'('
goes down a level in my tree (to the right).
','
goes to the left-side branch of my tree
'$'
inserts a NULL node.
')'
means going up a level.
This is what I came up with after 2-3 days of coding:
#define SUCCESS 0
typedef struct BinaryTree
{
char info;
BinaryTree *left,*right,*father;
}BinaryTree;
int create(BinaryTree*nodeBT, const char *expression)
{
nodeBT *aux;
nodeBT *root;
nodeBT *parent;
nodeBT=(BinaryTree*) malloc (sizeof(BinaryTree));
nodeBT->info=*expression;
nodeBT->right=nodeBT->left=NULL;
nodeBT->father = NULL;
++expression;
parent=nodeBT;
root=nodeBT;
while (*expression)
{if (isalpha (*expression))
{aux=(BinaryTree*) malloc (sizeof(BinaryTree));
aux->info=*expression;
aux->dr=nodeBT->st=NULL;
aux->father= parent;
nodeBT=aux;}
if (*expression== '(')
{parent=nodeBT;
nodeBT=nodeBT->dr;}
if (*expression== ',')
{nodeBT=nodeBT->father;
nodeBT=nodeBT->dr;}
if (*expression== ')')
{nodeBT=nodeBT->father;
parent= nodeBT->nodeBT;}
if (*expression== '$')
++expression;
++expression;
}
nodeBT=root;
return SUCCESS;
}
At the end, while trying to access the newly created tree, I keep getting "memory unreadable 0xCCCCCC". And I haven't got the slightest hint where I'm getting it wrong.
Any idea ?
Several problems:
You haven't shown us the definition of type nodeBT
, but you've declared aux
, root
, and parent
to be pointers to that type.
You then assign aux
to point to a BinaryTree
even though it's declared to point to a nodeBT
.
You assign to aux->dr
, which isn't part of BinaryTree
, so I can't just assume you typed nodeBT
where you meant BinaryTree
. You assign to nodeBT->st
, that is not a part of BinaryTree
either.
You try to return the parsed tree by assigning nodeBT=root
. The problem is that C is a “call-by-value” language. This implies that when your create
function assigns to nodeBT
, it is only changing its local variable's value. The caller of create
doesn't see that change. So the caller doesn't receive the root node. That's probably why you're getting your “memory unreadable” error; the caller is accessing some random memory, not the memory containing the root node.
Your code will actually be much easier to understand if you write your parser using a standard technique called “recursive descent”. Here's how.
Let's write a function that parses one node from the expression string. Naively, it should have a signature like this:
BinaryTree *nodeFromExpression(char const *expression) {
To parse a node, we first need to get the node's info
:
char info = expression[0];
Next, we need to see if the node should have children.
BinaryTree *leftChild = NULL;
BinaryTree *rightChild = NULL;
if (expression[1] == '(') {
If it should have children, we need to parse them. This is where we put the “recursive” in “recursive descent”: we just call nodeFromExpression
again to parse each child. To parse the left child, we need to skip the first two characters in expression
, since those were the info and the (
of the current node:
leftChild = nodeFromExpression(expression + 2);
But how much do we skip to parse the right child? We need to skip all the characters that we used while parsing the left child…
rightChild = nodeFromExpression(expression + ???
We don't know how many characters that was! It turns out we need to make nodeFromExpression
return not just the node it parsed, but also some indication of how many characters it consumed. So we need to change the signature of nodeFromExpression
to allow that. And what if we run into an error while parsing? Let's define a structure that nodeFromExpression
can use to return the node it parsed, the number of characters it consumed, and the error it encountered (if there was one):
typedef struct {
BinaryTree *node;
char const *error;
int offset;
} ParseResult;
We'll say that if error
is non-null, then node
is null and offset
is the offset in the string where we found the error. Otherwise, offset
is just past the last character consumed to parse node
.
So, starting over, we'll make nodeFromExpression
return a ParseResult
. It will take the entire expression string as input, and it will take the offset in that string at which to start parsing:
ParseResult nodeFromExpression(char const *expression, int offset) {
Now that we have a way to report errors, let's do some error checking:
if (!expression[offset]) {
return (ParseResult){
.error = "end of string where info expected",
.offset = offset
};
}
char info = expression[offset++];
I didn't mention this the first time through, but we should handle your $
token for NULL here:
if (info == '$') {
return (ParseResult){
.node = NULL,
.offset = offset
};
}
Now we can get back to parsing the children.
BinaryTree *leftChild = NULL;
BinaryTree *rightChild = NULL;
if (expression[offset] == '(') {
So, to parse the left child, we just call ourselves recursively again. If the recursive call gets an error, we return the same result:
ParseResult leftResult = nodeFromExpression(expression, offset);
if (leftResult->error)
return leftResult;
OK, we parsed the left child successfully. Now we need to check for, and consume, the comma between the children:
offset = leftResult.offset;
if (expression[offset] != ',') {
return (ParseResult){
.error = "comma expected",
.offset = offset
};
}
++offset;
Now we can recursively call nodeFromExpression
to parse the right child:
ParseResult rightResult = nodeFromExpression(expression, offset);
The error case now is a bit more complex if we don't want to leak memory. We need to free the left child before returning the error:
if (rightResult.error) {
free(leftResult.node);
return rightResult;
}
Note that free
does nothing if you pass it NULL
, so we don't need to check for that explicitly.
Now we need to check for, and consume, the )
after the children:
offset = rightResult.offset;
if (expression[offset] != ')') {
free(leftResult.node);
free(rightResult.node);
return (ParseResult){
.error = "right parenthesis expected",
.offset = offset
};
}
++offset;
We need to set our local leftChild
and rightChild
variables while the leftResult
and rightResult
variables are still in scope:
leftChild = leftResult.node;
rightChild = rightResult.node;
}
We've parsed both children, if we needed to, so now we're ready to construct the node we need to return:
BinaryTree *node = (BinaryTree *)calloc(1, sizeof *node);
node->info = info;
node->left = leftChild;
node->right = rightChild;
We have one last thing to do: we need to set the father
pointers of the children:
if (leftChild) {
leftChild->father = node;
}
if (rightChild) {
rightChild->father = node;
}
Finally, we can return a successful ParseResult
:
return (ParseResult){
.node = node,
.offset = offset
};
}
I've put all the code in this gist for easy copy'n'paste.
If your compiler doesn't like the (ParseResult){ ... }
syntax, you should look for a better compiler. That syntax has been standard since 1999 (§6.5.2.5 Compound Literals). While you're looking for a better compiler, you can work around it like this.
First, add two static functions:
static ParseResult ParseResultMakeWithNode(BinaryTree *node, int offset) {
ParseResult result;
memset(&result, 0, sizeof result);
result.node = node;
result.offset = offset;
return result;
}
static ParseResult ParseResultMakeWithError(char const *error, int offset) {
ParseResult result;
memset(&result, 0, sizeof result);
result.error = error;
result.offset = offset;
return result;
}
Then, replace the problematic syntax with calls to these functions. Examples:
if (!expression[offset]) {
return ParseResultMakeWithError("end of string where info expected",
offset);
}
if (info == '$') {
return ParseResultMakeWithNode(NULL, offset);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With