I have some ideas to do with natural language processing. I will need some grammars of the
S -> NP VP
variety in order to play with them.
If I try to write these rules myself it will be a tedious and error-prone business. Has anyone ever typed up and released comprehensive rule sets for English and other natural languages? Ideally written in BNF, Prolog or similar syntax.
My project only relates to context-free grammars, I'm not interested in statistical methods or machine learning -- I need to systematically produce Engligh-like and Foobarian-like sentences.
If you know where to find such materiel, I'd very much appreciate it.
You might want to look at Attempto Controlled English and its Prolog-based tools.
Since statistical parsing came in vogue in the early 90s, grammars have usually not been distributed, except for specific problem domains, but derived from distributed corpora such as the Penn Treebank. If you can get a hold of that (I believe a sample is distributed with NLTK), you can "roll your own" grammar by looking at all tree fragments and translating them to rules. (E.g., if you find a node labeled S with children labeled NP and VP, you know there should be a rule S -> NP VP. Pruning the rules that occur infrequently would be a good idea.)
The most comprehensive context-free grammar for English that I know of is the one described in:
Gazdar, Gerald; Ewan H. Klein, Geoffrey K. Pullum, Ivan A. Sag. 1985. Generalized Phrase Structure Grammar. Oxford: Blackwell.
There are also several rule-based but non-context-free grammars freely available online, e.g., the Penn XTAG grammar or the HPSG English Resource Grammar.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With