I am working on a project where the user inputs a human readable search string with AND OR operators. I give three examples
The above are samples of the input I might get. I want to take that input and convert it to regex. Isn't this a sample of a compiler? Looking at it, I see that what I want to do is convert a high level command into a low level one. Do you have any suggestions on how I could accomplish the above? What I want is, pass the regex being produced to jsoup (pseudo selector :matchesOwn) and query an html document. Thank you for your help.
The general way of doing this is to make an intermediate representation in form of an easily traversable data structure. This is usually called an AST. If you're not familiar with the concept, have a look at calculator-ast which does this transformation for a calculator language.
In order to turn the user input strings into ASTs, you need to use a parser. You could have a look at antlr. Personally I use v3, v4 seems to be less mature. Have a look at antlr3.org. If you want to write the parser yourself, you could giva a pratt parser a shot. This is not trivial and incorporating nice error handling takes time, but it can be a fun exercise.
Once you have an AST, turning it into a regex should be trivial by traversing the AST and outputting chars as you go along.
Good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With