How to use Apache spark as Query Engine?

Question

i am using Apache Spark For Big data Processing. The data is loaded to Data frames from a Flat file source or JDBC source. The Job is to search specific records from the data frame using spark sql.

So i have to Run the job again and again for new search terms. every time i have to submit the Jar files using spark submit to get the results. As the size of data is 40.5 GB it becomes tedious to reload the same data every time to data frame to get the results for different queries.

so What i need is,

a way if i can load the data in data frame once and query it multiple time with out submitting the jar multiple times ?

if we could use spark as a search engine/ query engine?

if we can load the data into data frame once and query the data frame remotely using RestAP

> The current configuration of My Spark Deployment is

5 node cluster.

runs on yarn rm.

i have tried to use spark-job server but it also runs the job every time.

Piotr Reszke · Accepted Answer

You might be interested in HiveThriftServer and Spark integration.

Basically you start a Hive Thrift Server and inject your HiveContext build from SparkContext:

...
val sql = new HiveContext(sc)
sql.setConf("hive.server2.thrift.port", "10001")
...
dataFrame.registerTempTable("myTable")
HiveThriftServer2.startWithContext(sql)
...

There are several client libraries and tools to query the server: https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

Including CLI tool - beeline

Reference: https://medium.com/@anicolaspp/apache-spark-as-a-distributed-sql-engine-4373e254e0f9#.3ntbhdxvr

How to use Apache spark as Query Engine?

Tags:

apache-spark

apache-spark-sql

spark-streaming

PradhanKamal

1 Answers

Piotr Reszke

Recent Activity

Donate For Us

How to use Apache spark as Query Engine?

Tags:

apache-spark

apache-spark-sql

spark-streaming

PradhanKamal

1 Answers

Piotr Reszke

Related questions

Recent Activity

Donate For Us