Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive Descent Parser in Java

I would like to preface this by saying this is a homework assignment for my third Year Programming Languages Class, and I'm looking for some help with it. My assignment reads:

Deadline: February 22, 2013 at 11:55pm
Submission: Please upload the following to the CMS.

1. Source code
2. A screen shot of the execution of your program including the input file you used

Use any programming language you prefer to write a recursive-descent parser that parses the language generated by the following EBNF descriptions. Your parser should detect whether or not the input program has any syntax errors. It does not have to specify what and where the error is.

<program>   begin <stmt_list> end
<stmt_list>  <stmt> {;<stmt_list>}
<stmt>    <assign_stmt> | <while_stmt>
<assign_stmt>  <var> = <expr>
<var>  identifier  (An identifier is a string that begins with a letter followed by 0     or more letters and digits)
<expr>  <var> { (+|-) <var>}           
<while_stmt>   while (<logic_expr>)  <stmt>
<logic_expr> ® <var> (< | >) <var>  (Assume that logic expressions have only less than     or greater than operators)

The symbols that look funny are just arrows pointing to the right.

My problem at the moment is more logical then it is programming: in my first attempt, I read the entire input program in, saved it to a string, then parsed that string and converted every symbol to either a terminal, expr, or what have you.

I eventually found that this way would not work because, A: I don't think that it is RDP, B: many of the non terminals are made of more then 1 statement.

I gave up on that approach, and decided that before I waste more time programming, I would Pseudo everything out. My new idea was to make 1 method, for each non terminal symbol, and just parse the input string symbol by symbol, hoping between those methods. This approach seemed logical, but as I started writing the pseudocode I got very lost and confused as to what I needed to do. How would I finish this code?

Here is some pseudocode for RDP:

intputString;

public void parseProgram (Symbol.typeIsProgram) {
    if getNextSymbol == "begin" {
        if (intputString.substring (inputString.length()-3,
                inputString.length()) == "end") {
            Symbol stmt_lsit = new Symbol (intputString)
            parseStmt_list(stmt_list);              
        } else {
            Out "error, prog must end with end"
        }
    } else {
        Out "error, prog must begin with begin"
    }   
}

public void parseStmt_list (Stmbol.typeIsStmt_list) {
    symbol = getNextSymbol;
    if (Symbol.typeIsVar) {
        parseVar(symbol)
    } else if (Symbol.typeIsWhile)  {
        // weve only capture if the first word return is a while, we dont have the whole while statement yet
        ParseWhile_stmt(symbol)
    } else { }
}

public void parseStmt () { }
public void parseAssign_stmt () { }
public void parseVar () { }
public void parseExpr () { }
public void parseWhile_stmt () { }
public void parseLogic_expr () { }

public Symbol getNextSymbol() {
    //returns the next symbol in input string and removes it from the input string
}

Just an FYI a sample input program for my parser would be.

begin 
total = var1 + var2; 
while (var1 < var2) 
while ( var3 > var4)
var2 = var2 - var1 
end
like image 763
user1972748 Avatar asked Feb 16 '13 18:02

user1972748


1 Answers

According to this particular assignment it is alright to use string processing in a functional way.

Try this algorithm:

  1. check, that line begins with begin and ends with end
  2. if yes - split remaining string with ; and do the following steps for each part (statement)
  3. check if statement consists a = or while substring
  4. for assignment check you can split an input with + or - and for each part check the variable condition (alphanumeric content)
  5. for while - check for ( and ) and recursively process two strings between brackets and after it
  6. finally, for logical expression split by substring < or >, and check that parts are variables (alphanumeric)

Maybe, it is not very flexible solution, but as I see it is acceptable for your task.

EDIT:

I found the task interesting and tried to write a complete solution.

public enum Parser {
PROGRAM {
    void parse(String s) throws ParseException {
        s = s.trim();
        if (s.startsWith("begin") && s.endsWith("end")) {
            STATEMENT_LIST.parse(s.substring(5, s.length() - 3));
        } else {
            throw new ParseException("Illegal begin/end");
        }
    }
},

STATEMENT_LIST {
    void parse(String s) throws ParseException {
        String[] parts = s.trim().split(";");
        for (String part : parts) {
            STATEMENT.parse(part.trim());
        }
    }
},

STATEMENT {
    void parse(String s) throws ParseException {
        if (s.startsWith("while")) {
            WHILE.parse(s.substring(5));
        } else if (s.contains("=")) {
            ASSIGNMENT.parse(s);
        } else {
            throw new ParseException("Illegal statement: " + s);
        }
    }
},

WHILE {
    void parse(String s) throws ParseException {
        int i = s.indexOf("(");
        int j = s.indexOf(")");
        if (i != -1 && j != -1) {
            LOGICAL.parse(s.substring(i + 1, j).trim());
            STATEMENT.parse(s.substring(j + 1).trim());
        } else {
            throw new ParseException("Illegal while: " + s);
        }
    }
},

ASSIGNMENT {
    void parse(String s) throws ParseException {
        int i = s.indexOf("=");
        if (i != -1) {
            VARIABLE.parse(s.substring(0, i).trim());
            EXPRESSION.parse(s.substring(i + 1).trim());
        }
    }
},

EXPRESSION {
    void parse(String s) throws ParseException {
        String[] parts = s.split("\\+|-");
        for (String part : parts) {
            VARIABLE.parse(part.trim());
        }
    }
},

LOGICAL {
    void parse(String s) throws ParseException {
        int i;
        if (s.contains("<")) {
            i = s.indexOf("<");
        } else if (s.contains(">")) {
            i = s.indexOf(">");
        } else {
            throw new ParseException("Illegal logical: " + s);
        }

        VARIABLE.parse(s.substring(0, i).trim());
        VARIABLE.parse(s.substring(i + 1).trim());
    }
},

VARIABLE {
    void parse(String s) throws ParseException {
        if (!s.matches("^[a-zA-Z][a-zA-Z0-9]*$")) {
            throw new ParseException("Illegal variable: " + s);
        }
    }
};

abstract void parse(String s) throws ParseException;

public static void main(String[] args) {
    try {
        PROGRAM.parse("begin \n" +
                "total = var1 + var2; \n" +
                "while (var1 < var2) \n" +
                "while ( var3 > var4)\n" +
                "var2 = var2 - var1 \n" +
                "end");
        System.out.println("OK");
    } catch (ParseException e) {
        System.out.println(e.getMessage());
    }
}
}

class ParseException extends Exception {
public ParseException(String message) {
    super(message);
}
}
like image 79
Dmitry Tikhonov Avatar answered Oct 14 '22 07:10

Dmitry Tikhonov