Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark SQL - How to write DataFrame to text file?

I am using Spark SQL for reading parquet and writing parquet file.

But some cases,i need to write the DataFrame as text file instead of Json or Parquet.

Is there any default methods supported or i have to convert that DataFrame to RDD then use saveAsTextFile() method?

like image 689
Shankar Avatar asked Mar 15 '16 12:03

Shankar


2 Answers

df.repartition(1).write.option("header", "true").csv("filename.csv")
like image 99
Igorock Avatar answered Oct 07 '22 05:10

Igorock


Using Databricks Spark-CSV you can save directly to a CSV file and load from a CSV file afterwards like this

import org.apache.spark.sql.SQLContext

SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.read()
    .format("com.databricks.spark.csv")
    .option("inferSchema", "true")
    .option("header", "true")
    .load("cars.csv");

df.select("year", "model").write()
    .format("com.databricks.spark.csv")
    .option("header", "true")
    .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
    .save("newcars.csv");
like image 30
Radu Ionescu Avatar answered Oct 07 '22 05:10

Radu Ionescu