How to specify the path where saveAsTable saves files to?

Tags:

I am trying to save a DataFrame to S3 in pyspark in Spark1.4 using DataFrameWriter

df = sqlContext.read.format("json").load("s3a://somefile")
df_writer = pyspark.sql.DataFrameWriter(df)
df_writer.partitionBy('col1')\
         .saveAsTable('test_table', format='parquet', mode='overwrite')

The parquet files went to "/tmp/hive/warehouse/...." which is a local tmp directory on my driver.

I did setup hive.metastore.warehouse.dir in hive-site.xml to a "s3a://...." location, but spark doesn't seem to respect to my hive warehouse setting.

689

asked Jun 16 '15 18:06

ChromeHearts

1 Answers

Use path.

df_writer.partitionBy('col1')\
         .saveAsTable('test_table', format='parquet', mode='overwrite',
                      path='s3a://bucket/foo')

164

answered Sep 20 '22 11:09

ChromeHearts

Related questions
                            
                                Timeout Exception in Apache-Spark during program Execution
                            
                                How to split pipe-separated column into multiple rows?
                            
                                Spark: Find Each Partition Size for RDD
                            
                                PySpark: match the values of a DataFrame column against another DataFrame column
                            
                                How to remove duplicate values from a RDD[PYSPARK]
                            
                                How to flatten list inside RDD?
                            
                                SPARK/SQL:spark can't resolve symbol toDF
                            
                                What is apache zeppelin? [closed]
                            
                                How to use collect_set and collect_list functions in windowed aggregation in Spark 1.6?
                            
                                Spark 1.6: drop column in DataFrame with escaped column names
                            
                                Spark merge/combine arrays in groupBy/aggregate
                            
                                Spill to disk and shuffle write spark
                            
                                Spark Data frame search column starting with a string
                            
                                how to introduce the schema in a Row in Spark?
                            
                                Spark Twitter Streaming exception : (org.apache.spark.Logging) classnotfound
                            
                                pyspark convert dataframe column from timestamp to string of "YYYY-MM-DD" format
                            
                                Filter based on another RDD in Spark
                            
                                How to make the first row as header when reading a file in PySpark and converting it to Pandas Dataframe
                            
                                Exception in thread "main" java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)
                            
                                SBT assembly jar exclusion

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to specify the path where saveAsTable saves files to?

Tags:

apache-spark

apache-spark-sql

pyspark

ChromeHearts

People also ask

1 Answers

ChromeHearts

Recent Activity

Donate For Us