Spark Sql JDBC Support

Tags:

apache-spark

Currently we are building a reporting platform as a data store we used Shark. Since the development of Shark is stopped so we are in the phase of evaluating Spark SQL. Based on the use cases we have we had few questions.

1) We have data from various sources( MySQL, Oracle, Cassandra, Mongo). We would like to know how can we get this data into Spark SQL? Does there exist any utility which we can use? Does this utility support continuous refresh of data (sync of new add/update/delete on data store to Spark SQL?

2) Is the a way to create multiple database in Spark SQL?

3) For Reporting UI we use Jasper, we would like to connect from Jasper to Spark SQL. When we did our initial search we got to know currently there is no support for consumer to connect Spark SQL through JDBC, but in future releases you would like the add the same. We would like to know by when Spark SQL would have a stable release which would have JDBC Support? Meanwhile we took the source code from https://github.com/amplab/shark/tree/sparkSql but we had some difficulty in setting it up locally and evaluating it . It would be great if you can help us with setup instructions.(I can share the issue we are facing please let me know where can I post the error logs)

4) We would also require a SQL prompt where we can execute queries, currently Spark Shell provides SCALA prompt where SCALA code can be executed, from SCALA code we can fire SQL queries. Like Shark we would like to have SQL prompt in Spark SQL. When we did our search we found that in future release of Spark this would be added. It would be great if you can tell us which release of Spark would address the same.

712

asked Jul 08 '14 12:07

user2847246

1 Answers

as for

3) Spark 1.1 provides better support for SparkSQL ThriftServer interface, which you may want to use for JDBC interfacing. Hive JDBC clients that support v. 0.12.0 are able to connect and interface with such server.

4) Spark 1.1 also provides a SparkSQL CLI interface that can be used for entering queries. In the same fashion that Hive CLI or Impala Shell.

Please, provide more details about what you are trying to achieve for 1 and 2.

112

answered Oct 01 '22 06:10

rudygodoy

Related questions
                            
                                Structured Streaming - Foreach Sink
                            
                                Read data from remote hive on spark over JDBC returns empty result
                            
                                Why can't I display prediction column of Spark MultilayerPerceptronClassifier?
                            
                                How to add hbase-site.xml config file using spark-shell
                            
                                Re-run Spark jobs on Failure or Abort
                            
                                How do I use Spark ORC indexes?
                            
                                Get a registered Spark Accumulator by name
                            
                                Pyspark: spark-submit not working like CLI
                            
                                PySpark SparkSession Builder with Kubernetes Master
                            
                                Outer join two Datasets (not DataFrames) in Spark Structured Streaming
                            
                                In Spark ML, why is fitting a StringIndexer on a column with million of disctinct values yielding an OOM error?
                            
                                Spark Strucutured Streaming Window on non-timestamp column
                            
                                Access AWS Glue from local Spark
                            
                                PySpark: Deserializing an Avro serialized message contained in an eventhub capture avro file
                            
                                How to get the table name from Spark SQL Query [PySpark]?
                            
                                Fastest way to take elementwise sum of two Lists
                            
                                Spark and Hive in Hadoop 3: Difference between metastore.catalog.default and spark.sql.catalogImplementation
                            
                                How to convert a struct field in a Row to an avro record in Spark Java
                            
                                High Concurrency Clusters in Databricks
                            
                                Cassandra + Solr/Hadoop/Spark - Choosing the right tools

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With