Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I resolve "SparkException: Exception thrown in Future.get" issue?

I'm working on two pyspark dataframes and doing a left-anti join on them to track everyday changes and then send an email.
The first time I tried:

diff = Table_a.join(
    Table_b, 
    [Table_a.col1== Table_b.col1, Table_a.col2== Table_b.col2], 
    how='left_anti'
)

Expected output is a pyspark dataframe with some or no data.

This diff dataframe gets it's schema from Table_a. The first time I ran it, showed no data as expected with the schema representation. The next time onwards just throws SparkException:

Exception thrown in Future.get
like image 261
Dheeraj Arya Avatar asked Sep 19 '25 08:09

Dheeraj Arya


2 Answers

I use Scala, but, from my experience, this happens when one of the underlying tables has been changed somehow. My advice would be to try to run simply display(Table_a) and display(Table_b), and see if any of those commands fail. This should give you a hint about where is the problem.

In any case, to effectively solve the issue, my advice would clearing the cache running

%sql
REFRESH my_schema.table_a
REFRESH my_schema.table_b

and, then, redefining those variables, as in

Table_a = spark.table("my_schema.table_a")
Table_b = spark.table("my_schema.table_b")

This worked for me - hope it helps you too.

like image 102
Lucas Lima Avatar answered Sep 20 '25 23:09

Lucas Lima


Thank you @Lucas Lima. Every time i create a new table i clear the cache with the following command in pyspark:

table_a.cache()

Hope the information helps.

like image 32
Anasta Sia Avatar answered Sep 20 '25 23:09

Anasta Sia