Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to join datasets with same columns and select one?

I have two Spark dataframes which I am joining and selecting afterwards. I want to select a specific column of one of the Dataframes. But the same column name exists in the other one. Therefore I am getting an Exception for ambiguous column.

I have tried this:

d1.as("d1").join(d2.as("d2"), $"d1.id" === $"d2.id", "left").select($"d1.columnName")

and this:

d1.join(d2, d1("id") === d2("id"), "left").select($"d1.columnName")

but it does not work.

like image 717
Kratos Avatar asked Dec 28 '17 14:12

Kratos


1 Answers

which spark version you're using ? can you put a sample of your dataframes ? try this:

d2prim = d2.withColumnRenamed("columnName", d2_columnName)   
d1.join(d2prim , Seq("id"), "left_outer").select("columnName")
like image 70
firsni Avatar answered Oct 24 '22 15:10

firsni