Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark df.write quote all fields but not null values

I am trying to create a csv from values stored in the table:

 | col1   | col2   | col3  |
 | "one"  | null   | "one" |
 | "two"  | "two"  | "two" |

hive > select * from table where col2 is null;
 one   null    one 

I am getting the csv using the below code:

df.repartition(1)
  .write.option("header",true)
  .option("delimiter", ",")
  .option("quoteAll", true)
  .option("nullValue", "")
  .csv(S3Destination)

Csv I get:

"col1","col2","col3"
"one","","one"
"two","two","two"

Expected Csv:WITH NO DOUBLE QUOTES FOR NULL VALUE

"col1","col2","col3"
"one",,"one"
"two","two","two"

Any help is appreciated to know if the dataframe writer has options to do this.

like image 268
dreddy Avatar asked Oct 29 '22 03:10

dreddy


1 Answers

You can go in a udf approach and apply on the column (using withColumn on the repartitioned datafrmae above) where possiblity of double quote empty string is there see below sample code

 sqlContext.udf().register("convertToEmptyWithOutQuotes",(String abc) -> (abc.trim().length() > 0 ? abc : abc.replace("\"", " ")),DataTypes.StringType);

String has replace method which does the job.

val a =  Array("'x'","","z")
println(a.mkString(",").replace("\"", " "))

will produce 'x',,z

like image 180
Ram Ghadiyaram Avatar answered Nov 15 '22 05:11

Ram Ghadiyaram