I can load multiple files at once by passing multiple paths to the load
method, e.g.
spark.read
.format("com.databricks.spark.avro")
.load(
"/data/src/entity1/2018-01-01",
"/data/src/entity1/2018-01-12",
"/data/src/entity1/2018-01-14")
I'd like to prepare a list of paths first and pass them to the load
method, but I get the following compilation error:
val paths = Seq(
"/data/src/entity1/2018-01-01",
"/data/src/entity1/2018-01-12",
"/data/src/entity1/2018-01-14")
spark.read.format("com.databricks.spark.avro").load(paths)
<console>:29: error: overloaded method value load with alternatives:
(paths: String*)org.apache.spark.sql.DataFrame <and>
(path: String)org.apache.spark.sql.DataFrame
cannot be applied to (List[String])spark.read.format("com.databricks.spark.avro").load(paths)
Why? How to pass a list of paths to the load
method?
Spark core provides textFile() & wholeTextFiles() methods in SparkContext class which is used to read single and multiple text or csv files into a single Spark RDD. Using this method we can also read all files from a directory and files with a specific pattern.
You just need is a splat operator (_*
) the paths
list as
spark.read.format("com.databricks.spark.avro").load(paths: _*)
load
method support varargs type of argument, not the list type. So you have explicitly convert list to varargs adding : _*
in load function.
spark.read.format("com.databricks.spark.avro").load(paths: _*)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With