Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing a User's Query

So here's what I'm looking to achieve. I would like to give my users a single google-like textbox where they can type their queries. And I would like them to be able to express semi-natural language such as

"view all between 1/1/2008 and 1/2/2008"

it's ok if the syntax has to be fairly structured and limited to this specific domain ... these are expert users who will be using this.

Ultimately, I think I'd like the parse results to be available as some sort of expression tree. But if you've got some other ideas about what data structure might be better.

This is in C# :-)

like image 517
Joel Martinez Avatar asked Dec 09 '08 04:12

Joel Martinez


5 Answers

You are describing a programming language. Granted it's a small language (often called a little language, or Domain Specific Language (DSL)). If you've never heard the term recursive descent parser, you are probably better off following Paul's advice and using drop down boxes of some description.

However, again, I would have to agree with him, that if you want to do it, Antlr is the way to go. There are tutorials on the site that might help you get started. Basically, you will need to describe how the syntax using Backus-Naur Form notation.

You will then run Antlr over your grammer, and it will generate your parser. You can then feed the input from your textbook into an Abstract Syntax Tree. You can then use that Tree to generate your query. It's not as difficult as it all sounds, but there's a bit to it.

If you're really into this and/or want to stretch your programming wings a bit, you could read more on the topic with the Dragon Book, AKA Compilers: Principles, Techniques and Tools.

Good luck my friend.

like image 75
tsimon Avatar answered Oct 07 '22 12:10

tsimon


For a very simple language, I'd go with regexps. Main benefit there is you don't have to deal with any code generation. Debugging of the pattern matching is basically nil, though.

If your language is moderately complex (you wouldn't mind specifying the entire thing in a single grammar file), I'd go with Coco/R -- it's fast, easy to use, and makes extremely debuggable code.

For a more complex language, my current favorite is Antlr v3. Supports multi-file grammars (via the 'import' statement), which is very nice. The generated code is debuggable, but takes a bit of getting used to before debugging could be considered 'easy.'

like image 32
Alex Lyman Avatar answered Oct 07 '22 11:10

Alex Lyman


I have been in this situation before. After much discussion, we decided that context-sensitive dropdowns were a better solution than just a textbox.

For instance, have 4 dropdowns, but the last 3 are disabled. Then, when the user selects an option from the first, it populates and enables the others. So they would select "view all" then "between", and then maybe pop a textbox or a calendar for the last two.

Here's an example that's kind of like what I am talking about

like image 38
Paul Morel Avatar answered Oct 07 '22 12:10

Paul Morel


An expression tree is a good idea. There are many good general parsers and parser generators out there, open-source as well as commercial, which can transform correct query strings into expression trees.

like image 21
yfeldblum Avatar answered Oct 07 '22 13:10

yfeldblum


Use Oslo, it's designed specifically for this...

like image 22
George Tsiokos Avatar answered Oct 07 '22 11:10

George Tsiokos