Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert user input to regex

I am working on a project where the user inputs a human readable search string with AND OR operators. I give three examples

  1. a AND (b OR c) -> (?=.\ba\b)(?=.(\bb\b)|(\bc\b)).*
  2. a OR (b AND c)
  3. (a OR b) AND (c OR d)

The above are samples of the input I might get. I want to take that input and convert it to regex. Isn't this a sample of a compiler? Looking at it, I see that what I want to do is convert a high level command into a low level one. Do you have any suggestions on how I could accomplish the above? What I want is, pass the regex being produced to jsoup (pseudo selector :matchesOwn) and query an html document. Thank you for your help.

like image 519
Alkis Kalogeris Avatar asked Mar 16 '13 11:03

Alkis Kalogeris


1 Answers

The general way of doing this is to make an intermediate representation in form of an easily traversable data structure. This is usually called an AST. If you're not familiar with the concept, have a look at calculator-ast which does this transformation for a calculator language.

In order to turn the user input strings into ASTs, you need to use a parser. You could have a look at antlr. Personally I use v3, v4 seems to be less mature. Have a look at antlr3.org. If you want to write the parser yourself, you could giva a pratt parser a shot. This is not trivial and incorporating nice error handling takes time, but it can be a fun exercise.

Once you have an AST, turning it into a regex should be trivial by traversing the AST and outputting chars as you go along.

Good luck!

like image 173
Alexander Torstling Avatar answered Sep 30 '22 10:09

Alexander Torstling