I'm using spark 2.1 and tried to read csv file.
compile group: 'org.scala-lang', name: 'scala-library', version: '2.11.1' compile group: 'org.apache.spark', name: 'spark-core_2.11', version: '2.1.0'
Here is my code.
import java.io.{BufferedWriter, File, FileWriter}
import java.sql.{Connection, DriverManager}
import net.sf.log4jdbc.sql.jdbcapi.ConnectionSpy
import org.apache.spark.sql.{DataFrame, SparkSession, Column, SQLContext}
import org.apache.spark.sql.functions._
import org.postgresql.jdbc.PgConnection
spark.read
.option("charset", "utf-8")
.option("header", "true")
.option("quote", "\"")
.option("delimiter", ",")
.csv(...)
It works well. Problem is that spark read(DataFrameReader) option key is not same as reference (link). reference said I should use 'encoding' for encoding but not working, but charset work well. Is reference is wrong?
Apache PySpark provides the "csv("path")" for reading a CSV file into the Spark DataFrame and the "dataframeObj. write. csv("path")" for saving or writing to the CSV file. The Apache PySpark supports reading the pipe, comma, tab, and other delimiters/separator files.
Reading multiple CSV files into RDD Spark RDD's doesn't have a method to read csv file formats hence we will use textFile() method to read csv file like any other text file into RDD and split the record based on comma, pipe or any other delimiter.
You can see here:
val charset = parameters.getOrElse("encoding",
parameters.getOrElse("charset",StandardCharsets.UTF_8.name()))
Both encoding and charset are valid options, and you should have no problem using either when setting the encoding.
Charset is simply there for legacy support from when the spark csv code was from the databricks spark csv project, which has been merged into the spark project since 2.x. That is also where delimiter (now sep) comes from.
Note the default values for the csv reader, you can remove charset, quote, and delimiter from your code, since you are just using the default values. Leaving you with simply:
spark.read.option("header", "true").csv(...)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With