How to import multiple csv files in a single load?

Question

Consider I have a defined schema for loading 10 csv files in a folder. Is there a way to automatically load tables using Spark SQL. I know this can be performed by using an individual dataframe for each file [given below], but can it be automated with a single command rather than pointing a file can I point a folder?

df = sqlContext.read        .format("com.databricks.spark.csv")        .option("header", "true")        .load("../Downloads/2008.csv")

Yaron · Accepted Answer

Use wildcard, e.g. replace 2008 with *:

df = sqlContext.read        .format("com.databricks.spark.csv")        .option("header", "true")        .load("../Downloads/*.csv") // <-- note the star (*)

Spark 2.0

// these lines are equivalent in Spark 2.0 spark.read.format("csv").option("header", "true").load("../Downloads/*.csv") spark.read.option("header", "true").csv("../Downloads/*.csv")

Notes:

Replace format("com.databricks.spark.csv") by using format("csv") or csv method instead. com.databricks.spark.csv format has been integrated to 2.0.
Use spark not sqlContext

mputha · Answer

Ex1:

Reading a single CSV file. Provide complete file path:

 val df = spark.read.option("header", "true").csv("C:spark\sample_data\tmp\cars1.csv")

Ex2:

Reading multiple CSV files passing names:

val df=spark.read.option("header","true").csv("C:spark\sample_data\tmp\cars1.csv", "C:spark\sample_data\tmp\cars2.csv")

Ex3:

Reading multiple CSV files passing list of names:

val paths = List("C:spark\sample_data\tmp\cars1.csv", "C:spark\sample_data\tmp\cars2.csv") val df = spark.read.option("header", "true").csv(paths: _*)

Ex4:

Reading multiple CSV files in a folder ignoring other files:

val df = spark.read.option("header", "true").csv("C:spark\sample_data\tmp\*.csv")

Ex5:

Reading multiple CSV files from multiple folders:

val folders = List("C:spark\sample_data\tmp", "C:spark\sample_data\tmp1") val df = spark.read.option("header", "true").csv(folders: _*)

How to import multiple csv files in a single load?

Tags:

apache-spark

apache-spark-sql

spark-dataframe

Chendur

2 Answers

Spark 2.0

Yaron

mputha

Recent Activity

Donate For Us

How to import multiple csv files in a single load?

Tags:

apache-spark

apache-spark-sql

spark-dataframe

Chendur

2 Answers

Spark 2.0

Yaron

mputha

Related questions

Recent Activity

Donate For Us