I can read few json-files at the same time using * (star):
sqlContext.jsonFile('/path/to/dir/*.json')
Is there any way to do the same thing for parquet? Star doesn't works.
Reading multiple CSV files into RDD Spark RDD's doesn't have a method to read csv file formats hence we will use textFile() method to read csv file like any other text file into RDD and split the record based on comma, pipe or any other delimiter.
InputPath = [hdfs_path + "parquets/date=18-07-23/hour=2*/*.parquet",
hdfs_path + "parquets/date=18-07-24/hour=0*/*.parquet"]
df = spark.read.parquet(*InputPath)
FYI, you can also:
read subset of parquet files using the wildcard symbol * sqlContext.read.parquet("/path/to/dir/part_*.gz")
read multiple parquet files by explicitly specifying them sqlContext.read.parquet("/path/to/dir/part_1.gz", "/path/to/dir/part_2.gz")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With