Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark: How to save a dataframe with headers?

dataframe.saveasTextFile, saves only the data in a delimited format. How do I save the dataframe with headers in JAVA.

sourceRufFrame.toJavaRDD().map(new TildaDelimiter()).coalesce(1, true).saveAsTextFile(targetSrcFilePath);
like image 877
user3897533 Avatar asked Apr 08 '16 16:04

user3897533


People also ask

How do I save a Spark DataFrame as a CSV with header?

In order to write DataFrame to CSV with a header, you should use option(), Spark CSV data-source provides several options which we will see in the next section. I have 3 partitions on DataFrame hence it created 3 part files when you save it to the file system.

How can a Spark DataFrame be saved as a persistent table?

DataFrames can also be saved as persistent tables into Hive metastore using the saveAsTable command. Notice that an existing Hive deployment is not necessary to use this feature. Spark will create a default local Hive metastore (using Derby) for you.


2 Answers

If you want to save as csv file, i would suggest using spark-csv package. You can save your dataframe simply with spark-csv as below with header.

dataFrame.write
  .format("com.databricks.spark.csv")
  .option("header", "true")
  .option("delimiter",<your delimiter>)
  .save(output)

You can refer below link, for further information: https://github.com/databricks/spark-csv

Spark-csv has maven dependency.

like image 182
Srini Avatar answered Oct 16 '22 07:10

Srini


With Spark 2.x,

df.write.option("header", "true").csv("path")

Cheers

like image 29
Chitral Verma Avatar answered Oct 16 '22 09:10

Chitral Verma