I have to merge 5 dataframes into a single dataframe. The dataframes look like,
+-------------------+---------------------------------------------------------------------------+
|Timestamp |sentence |
+-------------------+---------------------------------------------------------------------------+
|2020-03-13 12:01:32| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c updated field1 with beats|
+-------------------+---------------------------------------------------------------------------+
+-------------------+------------------------------------------------------------------------+
|Timestamp |sentence |
+-------------------+------------------------------------------------------------------------+
|2020-03-04 23:10:59| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c updated field2 with kobo |
+-------------------+------------------------------------------------------------------------+
+-------------------+------------------------------------------------------------------------+
|Timestamp |sentence |
+-------------------+------------------------------------------------------------------------+
|2020-03-13 12:01:32| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c updated field3 with beats|
+-------------------+------------------------------------------------------------------------+
+-------------------+-------------------------------------------------------------------+
|Timestamp |sentence |
+-------------------+-------------------------------------------------------------------+
|2020-02-20 07:20:29| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c added an field4 with beats|
+-------------------+-------------------------------------------------------------------+
+-------------------+---------------------------------------------------------------+
|Timestamp |sentence |
+-------------------+---------------------------------------------------------------+
|2020-02-20 07:20:29| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c added a field5 with beats|
+-------------------+---------------------------------------------------------------+
Show works fine when union is applied for the first 3 dataframes, but upon including the last two, the spark job is not progressing.
To do the union I used,
dfs = [df1, df2, df3, df4, df5]
df_final = reduce(lambda a, b: a.union(b), dfs)
df_final.show()
I want to display the result, but the job is stuck at showString at NativeMethodAccessorImpl.java:0
How do I go about this issue?
Looks fine for me, as you have same datatype for union as well as same column names for unionByName
I think this is not the issue with union or unionByName There might be some other issue. Might be resource crunch from the scheduler stand point. See any other jobs are running parllely.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With