I am using instaparse to parse a simple query language used by the end user that evaluates to a Boolean result, eg "(AGE > 35) AND (GENDER = "MALE")", this query then needs to be applied to many thousands of rows of data to decide if each row meets the expression or not.
My question is what's the best way to transform the output of instaparse to a function that an be subsequently evaluated against each row? Eg the above query would be transformed to something like
fn [AGE GENDER] (AND (= AGE 35) (= GENDER "MALE"))
Please note I am a Clojure noob...
You can write a little compiler for the query language, using instaparse to produce a parse tree, regular Clojure functions to transform it into Clojure code and finally eval
to produce a Clojure function that you can then apply to your records.
The initial call to eval
will be somewhat expensive, but the resulting function will be equivalent to one written by hand in a source file and carry no performance penalty. In fact, this is one of the rare valid use cases for eval
-- producing a function whose code is constructed in a truly dynamic fashion which will then be called a large number of times.
Clearly when following this approach you'll need to make sure that you're not unwittingly allowing untrusted sources to execute arbitrary code.
To demonstrate, here's an instaparse parser based on a very simple grammar that's only just capable of parsing your sample query:
(def p (insta/parser "
expr = and-expr | pred
and-expr = <'('> expr <')'> ws? <'AND'> ws? <'('> expr <')'>
pred = (atom ws? rel ws? atom)
rel = '<' | '>' | '='
atom = symbol | number | string
symbol = #'[A-Z]+'
string = <'\"'> #'[A-Za-z0-9]+' <'\"'>
number = #'\\d+'
<ws> = <#'\\s+'>
"))
For the sample query, this produces the following parse tree:
[:expr
[:and-expr
[:expr
[:pred [:atom [:symbol "AGE"]] [:rel ">"] [:atom [:number "35"]]]]
[:expr
[:pred
[:atom [:symbol "GENDER"]]
[:rel "="]
[:atom [:string "MALE"]]]]]]
We can now write a multimethod to transform this into a Clojure expression while collecting symbols; the ctx
argument here is meant to be an atom holding the set of symbols encountered so far:
(defmulti expr-to-sexp (fn [expr ctx] (first expr)))
(defmethod expr-to-sexp :symbol [[_ name] ctx]
(let [name (clojure.string/lower-case name)
sym (symbol name)]
(swap! ctx conj sym)
sym))
(defmethod expr-to-sexp :string [[_ s] ctx]
s)
(defmethod expr-to-sexp :number [[_ n] ctx]
(Long/parseLong n))
(defmethod expr-to-sexp :atom [[_ a] ctx]
(expr-to-sexp a ctx))
(defmethod expr-to-sexp :rel [[_ name] ctx]
(symbol "clojure.core" name))
(defmethod expr-to-sexp :pred [[_ left rel right] ctx]
(doall (map #(expr-to-sexp % ctx) [rel left right])))
(defmethod expr-to-sexp :and-expr [[_ left right] ctx]
`(and ~(expr-to-sexp left ctx) ~(expr-to-sexp right ctx)))
(defmethod expr-to-sexp :expr [[_ child] ctx]
(expr-to-sexp child ctx))
Let's apply this to our sample parse tree:
(expr-to-sexp (p "(AGE > 35) AND (GENDER = \"MALE\")") (atom #{}))
;= (clojure.core/and (clojure.core/> age 35) (clojure.core/= gender "MALE"))
(let [ctx (atom #{})]
(expr-to-sexp (p "(AGE > 35) AND (GENDER = \"MALE\")") ctx)
@ctx)
;= #{age gender}
Finally, here's a function using the above to build a Clojure function:
(defn compile-expr [expr-string]
(let [expr (p expr-string)
ctx (atom #{})
body (expr-to-sexp expr ctx)]
(eval `(fn [{:keys ~(vec @ctx)}] ~body))))
You can use it like so:
(def valid? (compile-expr "(AGE > 35) AND (GENDER = \"MALE\")"))
(valid? {:gender "MALE" :age 36})
;= true
(valid? {:gender "FEMALE" :age 36})
;= false
I am not sure that I understand your question. However, here is my guess :)
You can use clojure's "filter" function to restrct a collection. I'm not familiar with instaparse, so I'll just mock up some test data (collection):
(def ls [{:age 10, :gender "m"} {:age 15 :gender "fm"}] )
So "ls" is our collection. To restrict the collection to elements which have a gender value of "m" and a age value greated then 5, we apply the following filter:
(filter (fn [a] (if (and (= (:gender a) "m") (> (:age a) 5)) a)) ls)
result is:
({:age 10, :gender "m"})
or you can express the filter "fn" as an anonymous function:
(filter #(if (and (= (:gender %1) "m") (> (:age %1) 5)) %1) ls)
Hope that helps. Note! It migh help if you can post a sample of the "result data" you are trying to process:)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With