I am using Spark SQL
for reading parquet and writing parquet file.
But some cases,i need to write the DataFrame
as text file instead of Json or Parquet.
Is there any default methods supported or i have to convert that DataFrame to RDD
then use saveAsTextFile()
method?
df.repartition(1).write.option("header", "true").csv("filename.csv")
Using Databricks Spark-CSV you can save directly to a CSV file and load from a CSV file afterwards like this
import org.apache.spark.sql.SQLContext SQLContext sqlContext = new SQLContext(sc); DataFrame df = sqlContext.read() .format("com.databricks.spark.csv") .option("inferSchema", "true") .option("header", "true") .load("cars.csv"); df.select("year", "model").write() .format("com.databricks.spark.csv") .option("header", "true") .option("codec", "org.apache.hadoop.io.compress.GzipCodec") .save("newcars.csv");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With