Write/store dataframe in text file

Tags:

scala

apache-spark

I am trying to write dataframe to text file. If a file contains single column then I am able to write in text file. If file contains multiple column then I a facing some error

Text data source supports only a single column, and you have 2 columns.

object replace {

  def main(args:Array[String]): Unit = {

    Logger.getLogger("org").setLevel(Level.ERROR)

    val spark = SparkSession.builder.master("local[1]").appName("Decimal Field Validation").getOrCreate()

    var sourcefile = spark.read.option("header","true").text("C:/Users/phadpa01/Desktop/inputfiles/decimalvalues.txt")

     val rowRDD = sourcefile.rdd.zipWithIndex().map(indexedRow => Row.fromSeq((indexedRow._2.toLong+1) +: indexedRow._1.toSeq)) //adding prgrefnbr               
                         //add column for prgrefnbr in schema
     val newstructure = StructType(Array(StructField("PRGREFNBR",LongType)).++(sourcefile.schema.fields))

     //create new dataframe containing prgrefnbr

     sourcefile = spark.createDataFrame(rowRDD, newstructure)
     val op= sourcefile.write.mode("overwrite").format("text").save("C:/Users/phadpa01/Desktop/op")

  }

}

662

asked Jun 14 '17 07:06

Pravinkumar Hadpad

2 Answers

you can convert the dataframe to rdd and covert the row to string and write the last line as

 val op= sourcefile.rdd.map(_.toString()).saveAsTextFile("C:/Users/phadpa01/Desktop/op")

Edited

As @philantrovert and @Pravinkumar have pointed that the above would append [ and ] in the output file, which is true. The solution would be to replace them with empty character as

val op= sourcefile.rdd.map(_.toString().replace("[","").replace("]", "")).saveAsTextFile("C:/Users/phadpa01/Desktop/op")

One can even use regex

128

answered Nov 04 '22 06:11

Ramesh Maharjan

I would recommend using a csv or other delimited formats. The following is an example with the most concise/elegant way to write to .tsv in Spark 2+

val tsvWithHeaderOptions: Map[String, String] = Map(
  ("delimiter", "\t"), // Uses "\t" delimiter instead of default ","
  ("header", "true"))  // Writes a header record with column names

df.coalesce(1)         // Writes to a single file
  .write
  .mode(SaveMode.Overwrite)
  .options(tsvWithHeaderOptions)
  .csv("output/path")

answered Nov 04 '22 06:11

Marsellus Wallace

Related questions
                            
                                Spark: sum over list containing None and Some()?
                            
                                C# equivalent of Scala List's Zip with Index?
                            
                                Is it possible to curry the other way around in Scala?
                            
                                When are Scala objects garbage collected?
                            
                                Possible to code generic return types in Scala similar to C++ templates?
                            
                                How do I get the runtime Class of a parameterized Type in a Scala trait
                            
                                is it possible to have a circular dependency between .java and .scala classes?
                            
                                How to pattern match on Scala's parser combinator result
                            
                                How would I implement a fixed size List in Scala?
                            
                                How to stay true to functional style in Scala for expressions
                            
                                Generic type unification: multiple parameters (T,T) vs. multiple parameter lists (T)(T)?
                            
                                Scala pattern match default guards
                            
                                Subtyping and type parameters in Scala
                            
                                Vector or MutableList / ListBuffer for performance
                            
                                QuickSort Traditional vs Functional Style What Causes This Difference?
                            
                                Use of Scala by-name parameters
                            
                                ScalaMock mocking a trait gives "MockFunction1 cannot be cast to StubFunction1"
                            
                                how to concatenate option in scala
                            
                                How do I use "not rlike" in spark-sql?
                            
                                Count the number of non-null values in a Spark DataFrame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With