Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pyspark read multiple csv files at once

I'm using SPARK to read files in hdfs. There is a scenario, where we are getting files as chunks from legacy system in csv format.

ID1_FILENAMEA_1.csv
ID1_FILENAMEA_2.csv
ID1_FILENAMEA_3.csv
ID1_FILENAMEA_4.csv
ID2_FILENAMEA_1.csv
ID2_FILENAMEA_2.csv
ID2_FILENAMEA_3.csv

This files are loaded to FILENAMEA in HIVE using HiveWareHouse Connector, with few transformation like adding default values. Similarly we have around 70 tables. Hive tables are created in ORC format. Tables are partitioned on ID. Right now, I'm processing all these files one by one. It's taking much time.

I want to make this process much faster. Files will be in GBs.

Is there is any way to read all the FILENAMEA files at the same time and load it to HIVE tables.

like image 675
Cdr Avatar asked Dec 04 '25 00:12

Cdr


2 Answers

You have two methods to read several CSV files in pyspark. If all CSV files are in the same directory and all have the same schema, you can read then at once by directly passing the path of directory as argument, as follow:

spark.read.csv('hdfs://path/to/directory')

If you have CSV files in different locations or CSV files in same directory but with other CSV/text files in it, you can pass them as string representing a list of path in .csv() method argument, as follow:

spark.read.csv('hdfs://path/to/filename1,hdfs://path/to/filename2')

You can have more information about how to read a CSV file with Spark here

If you need to build this list of paths from the list of files in HDFS directory, you can look at this answer, once you've created your list of paths, you can transform it to a string to pass to .csv() method with ','.join(your_file_list)

like image 55
Vincent Doba Avatar answered Dec 05 '25 22:12

Vincent Doba


Using: spark.read.csv(["path1","path2","path3"...]) you can read multiple files from different paths. But that means you have first to make a list of the paths. A list not a string of comma-separated file paths

like image 34
AEChris Avatar answered Dec 05 '25 22:12

AEChris



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!