I have a CSV file which has data contained in double quotes (").
"0001", "A", "001", "2017/01/01 12"
"0001", "B", "002", "2017/01/01 13"
I would like to read only pure data (without " symbol).
spark.read
.option("encoding", encoding)
.option("header", header)
.option("quote", quote)
.option("sep", sep)
Other options work well, but only quote seems not work properly. It load with quote symbol ("). How should I take this symbol off from loaded data.
dataframe.show result
+----+----+------+---------------+
| _c0| _c1| _c2| _c3|
+----+----+------+---------------+
|0001| "A"| "001"| "2017/01/01 12"|
|0001| "B"| "002"| "2017/01/01 13"|
+----+----+------+---------------+
You can use option quote as below
option("quote", "\"")
If you have an extra space between your two data as "abc", "xyz", than you need to use
option("ignoreLeadingWhiteSpace", true)
Hope this helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With