Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Very basic English grammar parser

Tags:

c++

parsing

nlp

I'm writing a very basic parser(mostly just to better understand how they work) that takes a user's input of a select few words, detects whether the sentence structure is OK or Not OK, and outputs the result. The grammar is:

Sentence: Noun Verb

Article Sentence

Sentence Conjunction Sentence

Conjunction: "and" "or" "but"

Noun: "birds" "fish" "C++"

Verb: "rules" "fly" "swim"

Article: "the"

Writing the grammar was simple. It's implementing the code that is giving me some trouble. My psuedocode for it is:

main()
get user input (string words;)
while loop (cin >> words)
call sentence()
end main()

sentence()
call noun()
if noun() call verb() (if verb is true return "OK" ???)(else "not ok"???)
else if not noun() call article()
                if article() call sentence() (if sentence is true "OK"???)(else "not"?)
else if not noun() call conjunction()
                   if sentence() conjunction() sentence() - no idea how to implement
                                                             return "OK"
else "not ok"

So there is my extremely sloppy psuedo code. I have a few questions on implementing it.

  1. For the word functions (noun, verb, etc.) how should I go about checking if they are true? (as in checking if the user's input has birds, fish, fly, swim, etc.)

  2. How should I handle the conjunction call and the output?

  3. Should I handle the output from the main function or the call functions?

  4. None of the above questions matter if my psuedo code is completely wrong. Is there anything wrong with the basics?

As an added note, I'm on a Chapter 6 exercise of Programming: Practice and Principles Using C++ so I'd prefer to use language syntax that I've already learned, so anything that falls into the category of advanced programming probably isn't very helpful. (The exercise specifically says not to use tokens, so count those out.)

Thanks in advance

Last Edit: In the book's public group I asked the same question and Bjarne Stroustrup commented back saying he put the exercise solution online. He basically had the input read into the sentence function and used if statements to return true or false. However, he didn't use articles so mine was much more complex. I guess if I've learned anything from this exercise its that when dealing with a lot of user input, tokenization is key (from what I know so far.) Here is my code for now. I may go back to it later because it is still very buggy and basically only returns if the sentence is OK and can't handle things like (noun, conjunction, sentence), but for now I'm moving on.

#include "std_lib_facilities.h"

bool article(string words)
{
               if (words == "the")
               return true;
               else return false;        
}

bool verb(string words)
{
               if (words == "rules" || words == "fly" || words == "swim")
               return true;
               else return false;                   
}

bool noun(string words)
{
               if (words == "birds" || words == "fish" || words == "c++")
               return true;
               else return false;                   
}

bool conjunction(string words)
{
              if (words == "and" || words == "but" || words == "or")
              return true;
              else return false;                  
}

bool sentence()
{
string w1;
string w2;
string w3;
string w4;

cin >> w1;
if (!noun(w1) && !article(w1)) return false; // grammar of IFS!

cin >> w2;
if (noun(w1) && !verb(w2)) return false;
if (article(w1) && !noun(w2)) return false;

cin >> w3;
if (noun(w1) && verb(w2) && (w3 == ".")) return true;
if (verb(w2) && !conjunction(w3)) return false;
if (noun(w2) && !verb(w3)) return false;
if (conjunction(w3)) return sentence();

cin >> w4;
if (article(w1) && noun(w2) && verb(w3) && (w4 == ".")) return true;
if (!conjunction(w4)) return false;
if (conjunction(w4)) return sentence();
}


int main()
{                                   
cout << "Enter sentence. Use space then period to end.\n";
            bool test = sentence();
            if (test)
               cout << "OK\n";
            else
               cout << "not OK\n";

keep_window_open(); }

like image 693
Alex Avatar asked Jun 23 '09 16:06

Alex


People also ask

What is parsing in English grammar?

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part (of speech).

Which grammar is used by parser?

In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language.

Do we use grammar in parser?

Parser GeneratorsThey take in a grammar as input and produce Java code to parse input. And they can handle more grammars than a recursive descent parser can.


1 Answers

Ok. If you really want to do it by hand :-(

There are two parts to this problem:

  • Lexical analysis
  • Syntactic Analysis.
  • We can ignore the Symantic analysis as this is why up ther.

First you have tokenize the input stream into resonable tokens. Words would be an obvios choice but that would leave a lot of work for the syntactic phase. So I would group up your words into the following types (Conjunction,Noun,Verb,Article) and then write a lexer to return the correct Lexems.

Lexer.cpp
enum Lexeme { END,Conjunction,Noun,Verb,Article };
Lexem getNextLexme(std::istream in)
{
    std::string word;
    in >> word;

    if (!in) {return END;}

         if (word == "and")   return Conjunction;
    else if (word == "birds") return Noun;
    else if (word == "fly")   return Verb;
    else if (word == "the")   return Article;

    ... etc
}

So now you can wrtie your syntactic parser in terms of the simplified token stream.

bool ParseSentence(std::istream in)
{
    Lexeme token = getNextLexme(in);
    switch(token)
    {
        case Noun:    if (!parseVerb(in))
                      {    return false;
                      }
                      return parseConjunctionOrEnd(in);
        case Article: return ParseSentence();
        case END:     return true;
    }
}
bool parseVerb(std::istream in)
{
    Lexeme token = getNextLexeme(in);
    if (token != Verb) { /*ERROR*/ return false;}
    return true;
 }
 // etc

Your other option with syntactic analyasis is to build a state table. But that involves hand analysing the grammer and determing the states. This should only be attepted with the most trivial of grammers, anything bigger than you have here should be left to tools that can generate the state table auto-magically.

So Assuming the grammer I defined in my original post below:
And hoping I get it correct as I am not an inflable tool :-)

State 1:   Start <Nothing Happened>
               Article -> State 2
               Noun -> State 3
               Otherwise Error
State 2:   Seen Article.
               Noun -> State 3
               Otherwise Error
State 3:   Seen Noun  in Sentence.
               Verb -> State 4
               Otherwise Error
State 4:   Seen Noun Verb
               End -> State 5
               Conjunction -> State 1
State 5:   Finished:

State 0:   Error State.


int stateTable[][]    // CurrentState,CurrentObject
           = {/*State 0: Error State:*/{},
                                       // END,Conjunction,Noun,Verb,Article 
              /*State 1: Start*/       {  0,  0,          3,   0,   2},
              /*State 2: Article*/     {  0,  0,          3,   0,   0},
              /*State 3: Noun*/        {  0,  0,          0,   4,   0},
              /*State 4: Noun Verb*/   {  5,  1,          0,   0,   0},
              /*State 5: End*/         {}
             };

bool parseSentence(std::iostream& in)
{
    int currentState = 1;
    while((currentState != 0) && (currentState != 5))
    {
        int token = getNextLexme(in);
        currentState = stateTable[currentState][token];
    }
    return currentState == 5;
}
like image 131
Martin York Avatar answered Sep 22 '22 17:09

Martin York