I am wondering if there is a way that will make the sparkSQL
table in sqlContext
directly visible by other processes, for example Tableau.
I did some research on thrift server, but I didn't find any specific explanation about it. Is it a middleware between Hive(database) and application(client)? If so, do I need to write into a Hive table in my spark program?
When I use Beeline to check the tables from thrift server, there's a field isTempTable
. Could I know what does it mean? I'm guessing it is a temp table in the sqlContext
of thrift server, because I read something about it is a spark driver program and all cached tables are visible through multiple programs. My confusion here is, if it is a driver program, where are the workers?
To summarize,
dataFrame.write.mode(SaveMode.Append).saveAsTable()
)?Thanks
Tableau can connect to Spark version 1.2. 1 and later. You can use the Spark SQL connector to connect to a Spark cluster on Azure HDInsight, Azure Data Lake, Databricks, or Apache Spark.
Tableau Prep Builder supports any join type you want to do. Custom SQL can be created in Tableau Prep Builder 2022.1.
Tableau Desktop: 2019.4 - 2022.1. Go to the MySQL website. Select your operating system from the drop-down list. Select the bit version of the driver for your Windows environment. Download and install the driver.
Apache Spark is an open-source, distributed processing system used for big data workloads. It utilizes in-memory caching, and optimized query execution for fast analytic queries against data of any size.
I assume you've moved on by now, but for anyone who comes across this answer, the Thrift server is effectively a broker between a JDBC connection and SparkSQL.
Once you've got Thrift running (see the Spark docs for a basic intro), you connect over JDBC using the Hive JDBC drivers to Thrift, and it in turn relays your SQL queries to Spark using a HiveContext.
If you have a full Hive metastore up and running, you should be able to see the Hive tables in your JDBC client immediately, otherwise you can create tables on demand by running commands like this in your JDBC client:
CREATE TABLE data1 USING org.apache.spark.sql.parquet OPTIONS (path "/path/to/parquetfile");
CREATE TABLE data2 USING org.apache.spark.sql.json OPTIONS (path "/path/to/jsonfile");
Hope this helps a little.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With