I am trying to write dataframe
to text
file. If a file contains single column then I am able to write in text file. If file contains multiple column then I a facing some error
Text data source supports only a single column, and you have 2 columns.
object replace {
def main(args:Array[String]): Unit = {
Logger.getLogger("org").setLevel(Level.ERROR)
val spark = SparkSession.builder.master("local[1]").appName("Decimal Field Validation").getOrCreate()
var sourcefile = spark.read.option("header","true").text("C:/Users/phadpa01/Desktop/inputfiles/decimalvalues.txt")
val rowRDD = sourcefile.rdd.zipWithIndex().map(indexedRow => Row.fromSeq((indexedRow._2.toLong+1) +: indexedRow._1.toSeq)) //adding prgrefnbr
//add column for prgrefnbr in schema
val newstructure = StructType(Array(StructField("PRGREFNBR",LongType)).++(sourcefile.schema.fields))
//create new dataframe containing prgrefnbr
sourcefile = spark.createDataFrame(rowRDD, newstructure)
val op= sourcefile.write.mode("overwrite").format("text").save("C:/Users/phadpa01/Desktop/op")
}
}
text("file_name") to read a file or directory of text files into a Spark DataFrame, and dataframe. write(). text("path") to write to a text file. When reading a text file, each line becomes each row that has string “value” column by default.
First, import the modules and create a spark session and then read the file with spark. read. format(), then create columns and split the data from the txt file show into a dataframe.
you can convert the dataframe to rdd and covert the row to string and write the last line as
val op= sourcefile.rdd.map(_.toString()).saveAsTextFile("C:/Users/phadpa01/Desktop/op")
Edited
As @philantrovert and @Pravinkumar have pointed that the above would append [
and ]
in the output file, which is true. The solution would be to replace
them with empty
character as
val op= sourcefile.rdd.map(_.toString().replace("[","").replace("]", "")).saveAsTextFile("C:/Users/phadpa01/Desktop/op")
One can even use regex
I would recommend using a csv
or other delimited formats. The following is an example with the most concise/elegant way to write to .tsv in Spark 2+
val tsvWithHeaderOptions: Map[String, String] = Map(
("delimiter", "\t"), // Uses "\t" delimiter instead of default ","
("header", "true")) // Writes a header record with column names
df.coalesce(1) // Writes to a single file
.write
.mode(SaveMode.Overwrite)
.options(tsvWithHeaderOptions)
.csv("output/path")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With