Reading ES from spark with elasticsearch-spark connector: all the fields are returned

Question

I've done some experiments in the spark-shell with the elasticsearch-spark connector. Invoking spark:

] $SPARK_HOME/bin/spark-shell --master local[2] --jars ~/spark/jars/elasticsearch-spark-20_2.11-5.1.2.jar

In the scala shell:

scala> import org.elasticsearch.spark._
scala> val es_rdd = sc.esRDD("myindex/mytype",query="myquery")

It works well, the result contains the good records as specified in myquery. The only thing is that I get all the fields, even if I specify a subset of these fields in the query. Example:

myquery = """{"query":..., "fields":["a","b"], "size":10}"""

returns all the fields, not only a and b (BTW, I noticed that size parameter is not taken in account neither : result contains more than 10 records). Maybe it's important to add that fields are nested, a and b are actually doc.a and doc.b.

Is it a bug in the connector or do I have the wrong syntax?

eliasah · Accepted Answer

The spark elasticsearch connector uses fields thus you cannot apply projection.

If you wish to use fine-grained control over the mapping, you should be using DataFrame instead which are basically RDDs plus schema.

pushdown predicate should also be enabled to translate (push-down) Spark SQL into Elasticsearch Query DSL.

Now a semi-full example :

myQuery = """{"query":..., """
val df = spark.read.format("org.elasticsearch.spark.sql")
                     .option("query", myQuery)
                     .option("pushdown", "true")
                     .load("myindex/mytype")
                     .limit(10) // instead of size
                     .select("a","b") // instead of fields

Reading ES from spark with elasticsearch-spark connector: all the fields are returned

Tags:

scala

elasticsearch

apache-spark

apache-spark-sql

Patrick

1 Answers

eliasah

Recent Activity

Donate For Us

Reading ES from spark with elasticsearch-spark connector: all the fields are returned

Tags:

scala

elasticsearch

apache-spark

apache-spark-sql

Patrick

1 Answers

eliasah

Related questions

Recent Activity

Donate For Us