The below code does not add the double quotes which is the default. I also tried adding # and single quote using option <code>quote</code> with no success. I also used <code>quoteMode</code> with <code>ALL</code> and <code>NON_NUMERIC</code> options, still no change in the output. <pre class="prettyprint"><code>s2d.coalesce(64).write .format("com.databricks.spark.csv") .option("header", "false") .save(fname) </code></pre> Are there any other options I can try? I am using spark-csv 2.11 over spark 2.1. Output it produces: <pre class="prettyprint"><code>d4c354ef,2017-03-14 16:31:33,2017-03-14 16:31:46,104617772177,340618697 </code></pre> Output I am looking for: <pre class="prettyprint"><code>“d4c354ef”,”2017-03-14 16:31:33”,”2017-03-14 16:31:46”,104617772177,340618697 </code></pre>

tl;dr Enable <code>quoteAll</code> option. <pre class="prettyprint"><code>scala> Seq(("hello", 5)).toDF.write.option("quoteAll", true).csv("hello5.csv") </code></pre> The above gives the following output: <pre class="prettyprint"><code>$ cat hello5.csv/part-00000-a0ecb4c2-76a9-4e08-9c54-6a7922376fe6-c000.csv "hello","5" </code></pre> That assumes the <code>quote</code> is <code>"</code> (see CSVOptions) That however won't give you "Double quotes around all non-numeric characters." Sorry. You can see all the options in CSVOptions that serves as the source of the options for the CSV reader and writer. p.s. <code>com.databricks.spark.csv</code> is currently a mere alias for <code>csv</code> format. You can use both interchangeably, but the shorter <code>csv</code> is preferred. p.s. Use <code>option("header", false)</code> (<code>false</code> as boolean not String) that will make your code slightly more type-safe.

How to save CSV with all fields quoted?

Tags:

scala

apache-spark

spark-csv

The below code does not add the double quotes which is the default. I also tried adding # and single quote using option quote with no success. I also used quoteMode with ALL and NON_NUMERIC options, still no change in the output.

s2d.coalesce(64).write
  .format("com.databricks.spark.csv")
  .option("header", "false")
  .save(fname)

Are there any other options I can try? I am using spark-csv 2.11 over spark 2.1.

Output it produces:

d4c354ef,2017-03-14 16:31:33,2017-03-14 16:31:46,104617772177,340618697

Output I am looking for:

“d4c354ef”,”2017-03-14 16:31:33”,”2017-03-14 16:31:46”,104617772177,340618697

816

asked Apr 26 '17 20:04

Arvind Kandaswamy

2 Answers

tl;dr Enable quoteAll option.

scala> Seq(("hello", 5)).toDF.write.option("quoteAll", true).csv("hello5.csv")

The above gives the following output:

$ cat hello5.csv/part-00000-a0ecb4c2-76a9-4e08-9c54-6a7922376fe6-c000.csv
"hello","5"

That assumes the quote is " (see CSVOptions)

That however won't give you "Double quotes around all non-numeric characters." Sorry.

You can see all the options in CSVOptions that serves as the source of the options for the CSV reader and writer.

p.s. com.databricks.spark.csv is currently a mere alias for csv format. You can use both interchangeably, but the shorter csv is preferred.

p.s. Use option("header", false) (false as boolean not String) that will make your code slightly more type-safe.

115

answered Oct 02 '22 16:10

Jacek Laskowski

In Spark 2.1 where the old CSV library has been inlined, I do not see any option for what you want in the csv method of DataFrameWriter as seen here.

So I guess you have to map over your data "manually" to determine which of the Row components are non-numbers and quote them accordingly. You could utilize a straightforward isNumeric helper function like this:

def isNumeric(s: String) = s.nonEmpty && s.forall(Character.isDigit)

As you map over your DataSet, quote the values where isNumeric is false.

answered Oct 02 '22 16:10

Vidya

Related questions
                            
                                spray-json can't find JsonReader for type List[T]
                            
                                How to get date and time from string?
                            
                                Scala overriding def with val throws NPE
                            
                                How to implement the lifecycle callbacks of play framework(2.5.x)
                            
                                Split function difference between char and string arguments
                            
                                Why does `Future#toString` returns `"List()"`?
                            
                                How to return full row using Slick's insertOrUpdate
                            
                                Scala: How to get a range of rows in a dataframe
                            
                                Get ID after insert with ReactiveMongo
                            
                                Scala Future/Promise fast-fail pipeline
                            
                                How fast is pattern matching in Scala
                            
                                Creating a Spark DataFrame from a single string
                            
                                Is the actor model not an anti-pattern, as the fire-and-forget style forces actors to remember a state?
                            
                                Inheritance of same name method from difference traits
                            
                                Slick - one to many table schema
                            
                                Is there a reason to use subtype as type parameter in Scala?
                            
                                What is the diffrence between JsObject and JsValue in Scala?
                            
                                What do the type parameters to Source<Out,Mat> mean?
                            
                                Passing a list of tuples as a parameter to a spark udf in scala
                            
                                Understanding type Parameters in Scala

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With