SparkSQL, Thrift Server and Tableau

Tags:

I am wondering if there is a way that will make the sparkSQL table in sqlContext directly visible by other processes, for example Tableau.

I did some research on thrift server, but I didn't find any specific explanation about it. Is it a middleware between Hive(database) and application(client)? If so, do I need to write into a Hive table in my spark program?

When I use Beeline to check the tables from thrift server, there's a field isTempTable. Could I know what does it mean? I'm guessing it is a temp table in the sqlContext of thrift server, because I read something about it is a spark driver program and all cached tables are visible through multiple programs. My confusion here is, if it is a driver program, where are the workers?

To summarize,

Where should I write my DataFrame, or tables in sqlContext to? Which method should I use(like dataFrame.write.mode(SaveMode.Append).saveAsTable())?
Should the default settings be used for the thrift server? Or are the changes necessary?

Thanks

429

asked Jul 23 '15 20:07

user3693309

1 Answers

I assume you've moved on by now, but for anyone who comes across this answer, the Thrift server is effectively a broker between a JDBC connection and SparkSQL.

Once you've got Thrift running (see the Spark docs for a basic intro), you connect over JDBC using the Hive JDBC drivers to Thrift, and it in turn relays your SQL queries to Spark using a HiveContext.

If you have a full Hive metastore up and running, you should be able to see the Hive tables in your JDBC client immediately, otherwise you can create tables on demand by running commands like this in your JDBC client:

CREATE TABLE data1 USING org.apache.spark.sql.parquet OPTIONS (path "/path/to/parquetfile");
CREATE TABLE data2 USING org.apache.spark.sql.json OPTIONS (path "/path/to/jsonfile");

Hope this helps a little.

answered Oct 24 '22 07:10

Ewan Leith

Related questions
                            
                                ALS model - how to generate full_u * v^t * v?
                            
                                Apache Toree to connect to a remote spark cluster
                            
                                Custom log4j.properties on AWS EMR
                            
                                (python) Spark .textFile(s3://...) access denied 403 with valid credentials
                            
                                Reading JSON files into Spark Dataset and adding columns from a separate Map
                            
                                How do I interpret Input size / records in Spark Stage UI
                            
                                my spark sql limit is very slow
                            
                                Why do I get a “Hive support is required to CREATE Hive TABLE (AS SELECT)” error when creating a table?
                            
                                Spark 2.3+ use of parquet.enable.dictionary?
                            
                                Spark read parquet with custom schema
                            
                                Spark SQL convert dataset to dataframe
                            
                                Cannot launch SparkPi example on Kubernetes Spark 2.4.0
                            
                                Running scala 2.12 on emr 5.29.0
                            
                                How to get SSSP actual path by apache spark graphX?
                            
                                Feeding Apache Spark Streaming from Amazon SQS?
                            
                                Is multithreading allowed on Spark/YARN?
                            
                                Not able to connect to postgres using jdbc in pyspark shell
                            
                                Spark with Avro, Kryo and Parquet
                            
                                Spark - Multiple filters on RDD in one pass
                            
                                relationship between RDD , partitions and nodes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SparkSQL, Thrift Server and Tableau

Tags:

apache-spark

apache-spark-sql

hive

user3693309

People also ask

1 Answers

Ewan Leith

Recent Activity

Donate For Us