Saving/Exporting the results of a Spark SQL Zeppelin query

Tags:

We're using apache zeppelin to analyse our datasets. We have some queries that we would like to run that have a large number of results that come back from them and would like to run the query in zeppelin but save the results (display is limited to 1000). Is there an easy way to get zeppelin save all the results of a query to s3 bucket maybe?

329

asked Sep 07 '16 00:09

vcetinick

1 Answers

I managed to whip up a notebook that effectively does what i want using the scala interpreter.

z.load("com.databricks:spark-csv_2.10:1.4.0")
val df= sqlContext.sql("""
select * from table
""")

df.repartition(1).write
    .format("com.databricks.spark.csv")
    .option("header", "true")
    .save("s3://amazon.bucket.com/csv_output/")

Its worth mentioning that the z.load function seemed to work for me one day, but then i tried it again and for some reason i had to declare it in its own paragraph with the %dep interpreter, then the remaining code in the standard scala interpreter

answered Oct 19 '22 16:10

vcetinick

Related questions
                            
                                Implementing MERGE INTO sql in pyspark
                            
                                TypeError: 'JavaPackage' object is not callable
                            
                                Spark pulling data into RDD or dataframe or dataset
                            
                                Is there any way to get the output of Spark's Dataset.show() method as a string?
                            
                                UDF cause warning: CachedKafkaConsumer is not running in UninterruptibleThread (KAFKA-1894)
                            
                                Does Spark support BigInteger type?
                            
                                Spark: Prevent shuffle/exchange when joining two identically partitioned dataframes
                            
                                How to set hive.metastore.warehouse.dir in HiveContext?
                            
                                Spark Truncated Spark Plan
                            
                                Spark createDataFrame(df.rdd, df.schema) vs checkPoint for breaking lineage
                            
                                SparkSQL MissingRequirementError when registering table
                            
                                Spark Exception : Task failed while writing rows
                            
                                Hive Sql dynamically get null column counts from a table
                            
                                Reading JSON files into Spark Dataset and adding columns from a separate Map
                            
                                Spark 2.0 Timestamp Difference in Milliseconds using Scala
                            
                                my spark sql limit is very slow
                            
                                Spark read parquet with custom schema
                            
                                Spark SQL convert dataset to dataframe
                            
                                Not able to connect to postgres using jdbc in pyspark shell
                            
                                SparkSQL, Thrift Server and Tableau

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Saving/Exporting the results of a Spark SQL Zeppelin query

Tags:

apache-spark-sql

apache-zeppelin

vcetinick

People also ask

1 Answers

vcetinick

Recent Activity

Donate For Us