Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skip missing files from hive table in spark to avoid FileNotFoundException

I'm reading a table using spark.sql() and then trying to print the count. But some of the files are missing or removed from HDFS directly.

Spark is failing with below Error:

Caused by: java.io.FileNotFoundException: File does not exist: hdfs://nameservice1/some path.../data

Hive is able to give me give me the count without error for the same query. Table is an external and partitioned table.

I wanted to ignore the missing files and prevent my Spark job from failing. I have searched over the internet and tried setting below config parameters while creating the spark session but no luck.

    SparkSession.builder
    .config("spark.sql.hive.verifyPartitionPath", "false")
    .config("spark.sql.files.ignoreMissingFiles", true)
    .config("spark.sql.files.ignoreCorruptFiles", true)
    .enableHiveSupport()
    .getOrCreate()

Referred https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-properties.html for above config parameters.

    val sql = "SELECT count(*) FROM db.table WHERE date=20190710"
    val df = spark.sql(sql)
    println(df.count)

I'm expecting the spark code to complete successfully without FileNotFoundException even if some of the files are missing from the partition information.

I'm wondering why spark.sql.files.ignoreMissingFiles has no effect.

Spark version is version 2.2.0.cloudera1. Kindly suggest. Thanks in advance.

like image 365
Gopal Tiwari Avatar asked Dec 28 '25 21:12

Gopal Tiwari


1 Answers

Setting below config parameter resolved the issue:

For Hive:

mapred.input.dir.recursive=true

For Spark Session:

SparkSession.builder
.config("mapred.input.dir.recursive",true)
.enableHiveSupport()
.getOrCreate()

On further analysis I found that a part of the partition directory is registered as partition location in table and under that many different folders are there and inside each folder we have actual data files. So we need to turn on recursive discovery in spark to read the data.

like image 82
Gopal Tiwari Avatar answered Jan 04 '26 18:01

Gopal Tiwari



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!