Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to transform instaparse output into a function that can be evaluated?

I am using instaparse to parse a simple query language used by the end user that evaluates to a Boolean result, eg "(AGE > 35) AND (GENDER = "MALE")", this query then needs to be applied to many thousands of rows of data to decide if each row meets the expression or not.

My question is what's the best way to transform the output of instaparse to a function that an be subsequently evaluated against each row? Eg the above query would be transformed to something like

fn [AGE GENDER] (AND (= AGE 35) (= GENDER "MALE"))

Please note I am a Clojure noob...

like image 611
David Collie Avatar asked Jan 19 '14 12:01

David Collie


2 Answers

You can write a little compiler for the query language, using instaparse to produce a parse tree, regular Clojure functions to transform it into Clojure code and finally eval to produce a Clojure function that you can then apply to your records.

The initial call to eval will be somewhat expensive, but the resulting function will be equivalent to one written by hand in a source file and carry no performance penalty. In fact, this is one of the rare valid use cases for eval -- producing a function whose code is constructed in a truly dynamic fashion which will then be called a large number of times.

Clearly when following this approach you'll need to make sure that you're not unwittingly allowing untrusted sources to execute arbitrary code.

To demonstrate, here's an instaparse parser based on a very simple grammar that's only just capable of parsing your sample query:

(def p (insta/parser "

expr = and-expr | pred
and-expr = <'('> expr <')'> ws? <'AND'> ws? <'('> expr <')'>
pred = (atom ws? rel ws? atom)
rel = '<' | '>' | '='
atom = symbol | number | string
symbol = #'[A-Z]+'
string = <'\"'> #'[A-Za-z0-9]+' <'\"'>
number = #'\\d+'
<ws> = <#'\\s+'>

"))

For the sample query, this produces the following parse tree:

[:expr
 [:and-expr
  [:expr
   [:pred [:atom [:symbol "AGE"]] [:rel ">"] [:atom [:number "35"]]]]
  [:expr
   [:pred
    [:atom [:symbol "GENDER"]]
    [:rel "="]
    [:atom [:string "MALE"]]]]]]

We can now write a multimethod to transform this into a Clojure expression while collecting symbols; the ctx argument here is meant to be an atom holding the set of symbols encountered so far:

(defmulti expr-to-sexp (fn [expr ctx] (first expr)))

(defmethod expr-to-sexp :symbol [[_ name] ctx]
  (let [name (clojure.string/lower-case name)
        sym  (symbol name)]
    (swap! ctx conj sym)
    sym))

(defmethod expr-to-sexp :string [[_ s] ctx]
  s)

(defmethod expr-to-sexp :number [[_ n] ctx]
  (Long/parseLong n))

(defmethod expr-to-sexp :atom [[_ a] ctx]
  (expr-to-sexp a ctx))

(defmethod expr-to-sexp :rel [[_ name] ctx]
  (symbol "clojure.core" name))

(defmethod expr-to-sexp :pred [[_ left rel right] ctx]
  (doall (map #(expr-to-sexp % ctx) [rel left right])))

(defmethod expr-to-sexp :and-expr [[_ left right] ctx]
  `(and ~(expr-to-sexp left ctx) ~(expr-to-sexp right ctx)))

(defmethod expr-to-sexp :expr [[_ child] ctx]
  (expr-to-sexp child ctx))

Let's apply this to our sample parse tree:

(expr-to-sexp (p "(AGE > 35) AND (GENDER = \"MALE\")") (atom #{}))
;= (clojure.core/and (clojure.core/> age 35) (clojure.core/= gender "MALE"))

(let [ctx (atom #{})]
  (expr-to-sexp (p "(AGE > 35) AND (GENDER = \"MALE\")") ctx)
  @ctx)
;= #{age gender}

Finally, here's a function using the above to build a Clojure function:

(defn compile-expr [expr-string]
  (let [expr (p expr-string)
        ctx  (atom #{})
        body (expr-to-sexp expr ctx)]
    (eval `(fn [{:keys ~(vec @ctx)}] ~body))))

You can use it like so:

(def valid? (compile-expr "(AGE > 35) AND (GENDER = \"MALE\")"))

(valid? {:gender "MALE" :age 36})
;= true

(valid? {:gender "FEMALE" :age 36})
;= false
like image 118
Michał Marczyk Avatar answered Oct 06 '22 07:10

Michał Marczyk


I am not sure that I understand your question. However, here is my guess :)

You can use clojure's "filter" function to restrct a collection. I'm not familiar with instaparse, so I'll just mock up some test data (collection):

(def ls  [{:age 10, :gender "m"} {:age 15 :gender "fm"}] )

So "ls" is our collection. To restrict the collection to elements which have a gender value of "m" and a age value greated then 5, we apply the following filter:

(filter (fn [a] (if (and (= (:gender a) "m") (> (:age a) 5)) a)) ls)

result is:

 ({:age 10, :gender "m"})

or you can express the filter "fn" as an anonymous function:

(filter #(if (and (= (:gender %1) "m") (> (:age %1) 5)) %1) ls)

Hope that helps. Note! It migh help if you can post a sample of the "result data" you are trying to process:)

like image 39
lorinpa Avatar answered Oct 06 '22 07:10

lorinpa