Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark job not ending : Show of dataframe

I have to merge 5 dataframes into a single dataframe. The dataframes look like,

+-------------------+---------------------------------------------------------------------------+
|Timestamp          |sentence                                                                   |
+-------------------+---------------------------------------------------------------------------+
|2020-03-13 12:01:32| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c updated field1 with beats|
+-------------------+---------------------------------------------------------------------------+
+-------------------+------------------------------------------------------------------------+
|Timestamp          |sentence                                                                |
+-------------------+------------------------------------------------------------------------+
|2020-03-04 23:10:59| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c updated field2 with kobo |
+-------------------+------------------------------------------------------------------------+
+-------------------+------------------------------------------------------------------------+
|Timestamp          |sentence                                                                |
+-------------------+------------------------------------------------------------------------+
|2020-03-13 12:01:32| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c updated field3 with beats|
+-------------------+------------------------------------------------------------------------+

+-------------------+-------------------------------------------------------------------+
|Timestamp          |sentence                                                           |
+-------------------+-------------------------------------------------------------------+
|2020-02-20 07:20:29| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c added an field4 with beats|
+-------------------+-------------------------------------------------------------------+

+-------------------+---------------------------------------------------------------+
|Timestamp          |sentence                                                       |
+-------------------+---------------------------------------------------------------+
|2020-02-20 07:20:29| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c added a field5 with beats|
+-------------------+---------------------------------------------------------------+

Show works fine when union is applied for the first 3 dataframes, but upon including the last two, the spark job is not progressing.

To do the union I used,

dfs = [df1, df2, df3, df4, df5]
df_final = reduce(lambda a, b: a.union(b), dfs)
df_final.show()

I want to display the result, but the job is stuck at showString at NativeMethodAccessorImpl.java:0

How do I go about this issue?

like image 353
Mister Spurious Avatar asked Dec 02 '25 23:12

Mister Spurious


1 Answers

Looks fine for me, as you have same datatype for union as well as same column names for unionByName

I think this is not the issue with union or unionByName There might be some other issue. Might be resource crunch from the scheduler stand point. See any other jobs are running parllely.

like image 66
Ram Ghadiyaram Avatar answered Dec 04 '25 14:12

Ram Ghadiyaram



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!