We are building our own query language similar to Mysql using antlr4. Except we only use where clause
, in other words user does not enter select/from
statements.
I was able to create grammar for it and generate lexers/parsers/listeners in golang.
Below our grammar file EsDslQuery.g4:
grammar EsDslQuery;
options {
language = Go;
}
query
: leftBracket = '(' query rightBracket = ')' #bracketExp
| leftQuery=query op=OR rightQuery=query #orLogicalExp
| leftQuery=query op=AND rightQuery=query #andLogicalExp
| propertyName=attrPath op=COMPARISON_OPERATOR propertyValue=attrValue #compareExp
;
attrPath
: ATTRNAME ('.' attrPath)?
;
fragment ATTR_NAME_CHAR
: '-' | '_' | ':' | DIGIT | ALPHA
;
fragment DIGIT
: ('0'..'9')
;
fragment ALPHA
: ( 'A'..'Z' | 'a'..'z' )
;
attrValue
: BOOLEAN #boolean
| NULL #null
| STRING #string
| DOUBLE #double
| '-'? INT EXP? #long
;
...
Query example: color="red" and price=20000 or model="hyundai" and (seats=4 or year=2001)
ElasticSearch supports sql queries with plugin here: https://github.com/elastic/elasticsearch/tree/master/x-pack/plugin/sql.
Having hard time to understand java code.
Since we have Logical Operators I am quite not sure how to get parse tree and convert it to ES query. Can somebody help/suggest ideas?
Update 1: Added more examples with corresponding ES query
Query Example 1: color="red" AND price=2000
ES query 1:
{
"query": {
"bool": {
"must": [
{
"terms": {
"color": [
"red"
]
}
},
{
"terms": {
"price": [
2000
]
}
}
]
}
},
"size": 100
}
Query Example 2: color="red" AND price=2000 AND (model="hyundai" OR model="bmw")
ES query 2:
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": {
"terms": {
"color": ["red"]
}
}
}
},
{
"bool": {
"must": {
"terms": {
"price": [2000]
}
}
}
},
{
"bool": {
"should": [
{
"term": {
"model": "hyundai"
}
},
{
"term": {
"region": "bmw"
}
}
]
}
}
]
}
},
"size": 100
}
Query Example 3: color="red" OR color="blue"
ES query 3:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": {
"terms": {
"color": ["red"]
}
}
}
},
{
"bool": {
"must": {
"terms": {
"color": ["blue"]
}
}
}
}
]
}
},
"size": 100
}
Elasticsearch has the speed, scale, and flexibility your data needs — and it speaks SQL. Use traditional database syntax to unlock non-traditional performance, like full text search across petabytes of data with real-time results.
You want Elasticsearch when you're doing a lot of text search, where traditional RDBMS databases are not performing really well (poor configuration, acts as a black-box, poor performance). Elasticsearch is highly customizable, extendable through plugins. You can build robust search without much knowledge quite fast.
Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of clauses: Leaf query clauses.
Working demo url: https://github.com/omurbekjk/convert-dsl-to-es-query-with-antlr, estimated time spent: ~3 weeks
After investigating antlr4 and several examples I found simple solution with listener and stack. Similar to how expressions are calculated using stack.
We need to overwrite to default base listener with ours to get triggers for each enter/exit grammar rules. Important rules are:
Below my custom listener code written in golang:
package parser
import (
"github.com/olivere/elastic"
"strings"
)
type MyDslQueryListener struct {
*BaseDslQueryListener
Stack []*elastic.BoolQuery
}
func (ql *MyDslQueryListener) ExitCompareExp(c *CompareExpContext) {
boolQuery := elastic.NewBoolQuery()
attrName := c.GetPropertyName().GetText()
attrValue := strings.Trim(c.GetPropertyValue().GetText(), `\"`)
// Based on operator type we build different queries, default is terms query(=)
termsQuery := elastic.NewTermQuery(attrName, attrValue)
boolQuery.Must(termsQuery)
ql.Stack = append(ql.Stack, boolQuery)
}
func (ql *MyDslQueryListener) ExitAndLogicalExp(c *AndLogicalExpContext) {
size := len(ql.Stack)
right := ql.Stack[size-1]
left := ql.Stack[size-2]
ql.Stack = ql.Stack[:size-2] // Pop last two elements
boolQuery := elastic.NewBoolQuery()
boolQuery.Must(right)
boolQuery.Must(left)
ql.Stack = append(ql.Stack, boolQuery)
}
func (ql *MyDslQueryListener) ExitOrLogicalExp(c *OrLogicalExpContext) {
size := len(ql.Stack)
right := ql.Stack[size-1]
left := ql.Stack[size-2]
ql.Stack = ql.Stack[:size-2] // Pop last two elements
boolQuery := elastic.NewBoolQuery()
boolQuery.Should(right)
boolQuery.Should(left)
ql.Stack = append(ql.Stack, boolQuery)
}
And main file:
package main
import (
"encoding/json"
"fmt"
"github.com/antlr/antlr4/runtime/Go/antlr"
"github.com/omurbekjk/convert-dsl-to-es-query-with-antlr/parser"
)
func main() {
fmt.Println("Starting here")
query := "price=2000 OR model=\"hyundai\" AND (color=\"red\" OR color=\"blue\")"
stream := antlr.NewInputStream(query)
lexer := parser.NewDslQueryLexer(stream)
tokenStream := antlr.NewCommonTokenStream(lexer, antlr.TokenDefaultChannel)
dslParser := parser.NewDslQueryParser(tokenStream)
tree := dslParser.Start()
listener := &parser.MyDslQueryListener{}
antlr.ParseTreeWalkerDefault.Walk(listener, tree)
esQuery := listener.Stack[0]
src, err := esQuery.Source()
if err != nil {
panic(err)
}
data, err := json.MarshalIndent(src, "", " ")
if err != nil {
panic(err)
}
stringEsQuery := string(data)
fmt.Println(stringEsQuery)
}
/** Generated es query
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"bool": {
"should": [
{
"bool": {
"must": {
"term": {
"color": "blue"
}
}
}
},
{
"bool": {
"must": {
"term": {
"color": "red"
}
}
}
}
]
}
},
{
"bool": {
"must": {
"term": {
"model": "hyundai"
}
}
}
}
]
}
},
{
"bool": {
"must": {
"term": {
"price": "2000"
}
}
}
}
]
}
}
*/
Have you thought about converting your sql-like statements to query string queries?
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"query_string" : {
"query" : "(new york city) OR (big apple)",
"default_field" : "content"
}
}
}
'
If your use-cases stay simple like color="red" and price=20000 or model="hyundai" and (seats=4 or year=2001)
, I'd go with the above. The syntax is quite powerful but the queries are guaranteed to run more slowly than the native, spelled-out DSL queries since the ES parser will need to convert them to the DSL for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With