Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark - Read csv file with quote

Tags:

apache-spark

I have a CSV file which has data contained in double quotes (").

"0001", "A", "001", "2017/01/01 12"

"0001", "B", "002", "2017/01/01 13"

I would like to read only pure data (without " symbol).

spark.read
 .option("encoding", encoding)
 .option("header", header)
 .option("quote", quote)
 .option("sep", sep)

Other options work well, but only quote seems not work properly. It load with quote symbol ("). How should I take this symbol off from loaded data.


dataframe.show result

+----+----+------+---------------+
| _c0| _c1|   _c2|             _c3|
+----+----+------+---------------+
|0001| "A"| "001"| "2017/01/01 12"|
|0001| "B"| "002"| "2017/01/01 13"|
+----+----+------+---------------+
like image 887
J.Done Avatar asked Jul 24 '17 07:07

J.Done


1 Answers

You can use option quote as below

option("quote", "\"")

If you have an extra space between your two data as "abc", "xyz", than you need to use

option("ignoreLeadingWhiteSpace", true)

Hope this helps

like image 128
koiralo Avatar answered Oct 12 '22 14:10

koiralo