Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SparkSQL, Thrift Server and Tableau

I am wondering if there is a way that will make the sparkSQL table in sqlContext directly visible by other processes, for example Tableau.

I did some research on thrift server, but I didn't find any specific explanation about it. Is it a middleware between Hive(database) and application(client)? If so, do I need to write into a Hive table in my spark program?

When I use Beeline to check the tables from thrift server, there's a field isTempTable. Could I know what does it mean? I'm guessing it is a temp table in the sqlContext of thrift server, because I read something about it is a spark driver program and all cached tables are visible through multiple programs. My confusion here is, if it is a driver program, where are the workers?

To summarize,

  1. Where should I write my DataFrame, or tables in sqlContext to? Which method should I use(like dataFrame.write.mode(SaveMode.Append).saveAsTable())?
  2. Should the default settings be used for the thrift server? Or are the changes necessary?

Thanks

like image 429
user3693309 Avatar asked Jul 23 '15 20:07

user3693309


People also ask

Can Tableau connect to spark?

Tableau can connect to Spark version 1.2. 1 and later. You can use the Spark SQL connector to connect to a Spark cluster on Azure HDInsight, Azure Data Lake, Databricks, or Apache Spark.

Can Tableau prep connect to SQL Server?

Tableau Prep Builder supports any join type you want to do. Custom SQL can be created in Tableau Prep Builder 2022.1.

How do I install Tableau drivers?

Tableau Desktop: 2019.4 - 2022.1. Go to the MySQL website. Select your operating system from the drop-down list. Select the bit version of the driver for your Windows environment. Download and install the driver.

What does Apache Spark do?

Apache Spark is an open-source, distributed processing system used for big data workloads. It utilizes in-memory caching, and optimized query execution for fast analytic queries against data of any size.


1 Answers

I assume you've moved on by now, but for anyone who comes across this answer, the Thrift server is effectively a broker between a JDBC connection and SparkSQL.

Once you've got Thrift running (see the Spark docs for a basic intro), you connect over JDBC using the Hive JDBC drivers to Thrift, and it in turn relays your SQL queries to Spark using a HiveContext.

If you have a full Hive metastore up and running, you should be able to see the Hive tables in your JDBC client immediately, otherwise you can create tables on demand by running commands like this in your JDBC client:

CREATE TABLE data1 USING org.apache.spark.sql.parquet OPTIONS (path "/path/to/parquetfile");
CREATE TABLE data2 USING org.apache.spark.sql.json OPTIONS (path "/path/to/jsonfile");

Hope this helps a little.

like image 71
Ewan Leith Avatar answered Oct 24 '22 07:10

Ewan Leith