I'm writing a very basic parser(mostly just to better understand how they work) that takes a user's input of a select few words, detects whether the sentence structure is OK or Not OK, and outputs the result. The grammar is:
Sentence: Noun Verb
Article Sentence
Sentence Conjunction Sentence
Conjunction: "and" "or" "but"
Noun: "birds" "fish" "C++"
Verb: "rules" "fly" "swim"
Article: "the"
Writing the grammar was simple. It's implementing the code that is giving me some trouble. My psuedocode for it is:
main()
get user input (string words;)
while loop (cin >> words)
call sentence()
end main()
sentence()
call noun()
if noun() call verb() (if verb is true return "OK" ???)(else "not ok"???)
else if not noun() call article()
if article() call sentence() (if sentence is true "OK"???)(else "not"?)
else if not noun() call conjunction()
if sentence() conjunction() sentence() - no idea how to implement
return "OK"
else "not ok"
So there is my extremely sloppy psuedo code. I have a few questions on implementing it.
For the word functions (noun, verb, etc.) how should I go about checking if they are true? (as in checking if the user's input has birds, fish, fly, swim, etc.)
How should I handle the conjunction call and the output?
Should I handle the output from the main function or the call functions?
None of the above questions matter if my psuedo code is completely wrong. Is there anything wrong with the basics?
As an added note, I'm on a Chapter 6 exercise of Programming: Practice and Principles Using C++ so I'd prefer to use language syntax that I've already learned, so anything that falls into the category of advanced programming probably isn't very helpful. (The exercise specifically says not to use tokens, so count those out.)
Thanks in advance
Last Edit: In the book's public group I asked the same question and Bjarne Stroustrup commented back saying he put the exercise solution online. He basically had the input read into the sentence function and used if statements to return true or false. However, he didn't use articles so mine was much more complex. I guess if I've learned anything from this exercise its that when dealing with a lot of user input, tokenization is key (from what I know so far.) Here is my code for now. I may go back to it later because it is still very buggy and basically only returns if the sentence is OK and can't handle things like (noun, conjunction, sentence), but for now I'm moving on.
#include "std_lib_facilities.h"
bool article(string words)
{
if (words == "the")
return true;
else return false;
}
bool verb(string words)
{
if (words == "rules" || words == "fly" || words == "swim")
return true;
else return false;
}
bool noun(string words)
{
if (words == "birds" || words == "fish" || words == "c++")
return true;
else return false;
}
bool conjunction(string words)
{
if (words == "and" || words == "but" || words == "or")
return true;
else return false;
}
bool sentence()
{
string w1;
string w2;
string w3;
string w4;
cin >> w1;
if (!noun(w1) && !article(w1)) return false; // grammar of IFS!
cin >> w2;
if (noun(w1) && !verb(w2)) return false;
if (article(w1) && !noun(w2)) return false;
cin >> w3;
if (noun(w1) && verb(w2) && (w3 == ".")) return true;
if (verb(w2) && !conjunction(w3)) return false;
if (noun(w2) && !verb(w3)) return false;
if (conjunction(w3)) return sentence();
cin >> w4;
if (article(w1) && noun(w2) && verb(w3) && (w4 == ".")) return true;
if (!conjunction(w4)) return false;
if (conjunction(w4)) return sentence();
}
int main()
{
cout << "Enter sentence. Use space then period to end.\n";
bool test = sentence();
if (test)
cout << "OK\n";
else
cout << "not OK\n";
keep_window_open(); }
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part (of speech).
In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language.
Parser GeneratorsThey take in a grammar as input and produce Java code to parse input. And they can handle more grammars than a recursive descent parser can.
Ok. If you really want to do it by hand :-(
There are two parts to this problem:
First you have tokenize the input stream into resonable tokens. Words would be an obvios choice but that would leave a lot of work for the syntactic phase. So I would group up your words into the following types (Conjunction,Noun,Verb,Article) and then write a lexer to return the correct Lexems.
Lexer.cpp
enum Lexeme { END,Conjunction,Noun,Verb,Article };
Lexem getNextLexme(std::istream in)
{
std::string word;
in >> word;
if (!in) {return END;}
if (word == "and") return Conjunction;
else if (word == "birds") return Noun;
else if (word == "fly") return Verb;
else if (word == "the") return Article;
... etc
}
So now you can wrtie your syntactic parser in terms of the simplified token stream.
bool ParseSentence(std::istream in)
{
Lexeme token = getNextLexme(in);
switch(token)
{
case Noun: if (!parseVerb(in))
{ return false;
}
return parseConjunctionOrEnd(in);
case Article: return ParseSentence();
case END: return true;
}
}
bool parseVerb(std::istream in)
{
Lexeme token = getNextLexeme(in);
if (token != Verb) { /*ERROR*/ return false;}
return true;
}
// etc
Your other option with syntactic analyasis is to build a state table. But that involves hand analysing the grammer and determing the states. This should only be attepted with the most trivial of grammers, anything bigger than you have here should be left to tools that can generate the state table auto-magically.
So Assuming the grammer I defined in my original post below:
And hoping I get it correct as I am not an inflable tool :-)
State 1: Start <Nothing Happened>
Article -> State 2
Noun -> State 3
Otherwise Error
State 2: Seen Article.
Noun -> State 3
Otherwise Error
State 3: Seen Noun in Sentence.
Verb -> State 4
Otherwise Error
State 4: Seen Noun Verb
End -> State 5
Conjunction -> State 1
State 5: Finished:
State 0: Error State.
int stateTable[][] // CurrentState,CurrentObject
= {/*State 0: Error State:*/{},
// END,Conjunction,Noun,Verb,Article
/*State 1: Start*/ { 0, 0, 3, 0, 2},
/*State 2: Article*/ { 0, 0, 3, 0, 0},
/*State 3: Noun*/ { 0, 0, 0, 4, 0},
/*State 4: Noun Verb*/ { 5, 1, 0, 0, 0},
/*State 5: End*/ {}
};
bool parseSentence(std::iostream& in)
{
int currentState = 1;
while((currentState != 0) && (currentState != 5))
{
int token = getNextLexme(in);
currentState = stateTable[currentState][token];
}
return currentState == 5;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With