I have two dataframes df1
and df2
. Both of them have the following schema:
|-- ts: long (nullable = true)
|-- id: integer (nullable = true)
|-- managers: array (nullable = true)
| |-- element: string (containsNull = true)
|-- projects: array (nullable = true)
| |-- element: string (containsNull = true)
df1
is created from an avro file while df2
from an equivalent parquet file. However, If I execute, df1.unionAll(df2).show()
, I get the following error:
org.apache.spark.sql.AnalysisException: unresolved operator 'Union;
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:37)
at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:174)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:49)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:103)
I ran into the same situation and it turns out to be not only the fields need to be the same but also you need to maintain the exact same ordering of the fields in both dataframe in order to make it work.
This is old and there are already some answers lying around but I just faced this problem while trying to make a union of two dataframes like in...
//Join 2 dataframes
val df = left.unionAll(right)
As others have mentioned, order matters. So just select right columns in the same order than left dataframe columns
//Join 2 dataframes, but take columns in the same order
val df = left.unionAll(right.select(left.columns.map(col):_*))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With