Saving empty DataFrame with known schema (Spark 2.2.1)

Tags:

Is it possible to save an empty DataFrame with a known schema such that the schema is written to the file, even though it has 0 records?

def example(spark: SparkSession, path: String, schema: StructType) = { 
  val dataframe = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema) 
  val dataframeWriter = dataframe.write.mode(SaveMode.Overwrite).format("parquet") 
  dataframeWriter.save(path) 

  spark.read.load(path) // ERROR!! No files to read, so schema unknown 
}

950

asked Apr 13 '18 16:04

Erik

2 Answers

This is the answer I received from Databricks Support:

This is actually a known issue in Spark. There is already fix done in opensource JIRA -> https://issues.apache.org/jira/browse/SPARK-23271. For more details on how this behavior will change from 2.4 please check this doc change https://github.com/apache/spark/pull/20525/files#diff-d8aa7a37d17a1227cba38c99f9f22511R1808 The behavior will be changed from Spark 2.4. Until then you need to go with any one of the following ways

Save a dataframe with at-least one record to preserve its schema

Save schema in a JSON file and use later

112

answered Oct 13 '22 20:10

Erik

I got a similar problem with Spark 2.1.0. I solved it using repartition before writing.

df.repartition(1).write.parquet("my/path")

answered Oct 13 '22 20:10

scauglog

Related questions
                            
                                Using spark dataFrame to load data from HDFS
                            
                                How to view the logs of a spark job after it has completed and the context is closed?
                            
                                Reading Json file using Apache Spark
                            
                                Pyspark : Custom window function
                            
                                Spark: How RDD.map/mapToPair work with Java
                            
                                spark on yarn run double times when error [duplicate]
                            
                                Spark Dataset equivalent for scala's "collect" taking a partial function
                            
                                How to add new columns to DataFrame given their names when they are missing?
                            
                                How to convert Dataset into JavaPairRDD?
                            
                                Why would Spark executors be removed (with "ExecutorAllocationManager: Request to remove executorIds" in the logs)?
                            
                                How to change column metadata in pyspark?
                            
                                How to write rows asynchronously in Spark Streaming application to speed up batch execution?
                            
                                spark-sql Table or view not found error
                            
                                How to join/merge a list of dataframes with common keys in PySpark?
                            
                                How to display a streaming DataFrame (as show fails with AnalysisException)?
                            
                                How to force repartitioning in a spark dataframe?
                            
                                Eclipse remote debug spark-submit
                            
                                How to create schema (StructType) with one or more StructTypes?
                            
                                How to convert nested avro GenericRecord to Row
                            
                                PySpark aggregation function for "any value"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Saving empty DataFrame with known schema (Spark 2.2.1)

Tags:

apache-spark

parquet

databricks

Erik

People also ask

2 Answers

Erik

scauglog

Recent Activity

Donate For Us