Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select all columns of a dataframe in join - Spark-scala

I am doing join of 2 data frames and select all columns of left frame for example:

val join_df = first_df.join(second_df, first_df("id") === second_df("id") , "left_outer")

in above I want to do select first_df.* .How can I select all columns of one frame in join ?

like image 708
user2895589 Avatar asked Jun 13 '16 01:06

user2895589


3 Answers

With alias:

first_df.alias("fst").join(second_df, Seq("id"), "left_outer").select("fst.*")
like image 137
user6022341 Avatar answered Nov 12 '22 05:11

user6022341


We can also do it with leftsemi join. leftsemi join will select the data from left side dataframe from a joined dataframe.

Here we join two dataframes df1 and df2 based on column col1.

    df1.join(df2, df1.col("col1").equalTo(df2.col("col1")), "leftsemi") 
like image 3
Keshav Prashanth Avatar answered Nov 12 '22 05:11

Keshav Prashanth


Suppose you:

  1. Want to use the DataFrame syntax.
  2. Want to select all columns from df1 but only a couple from df2.
  3. This is cumbersome to list out explicitly due to the number of columns in df1.

Then, you might do the following:

val selectColumns = df1.columns.map(df1(_)) ++ Array(df2("field1"), df2("field2"))
df1.join(df2, df1("key") === df2("key")).select(selectColumns:_*)
like image 3
Bryan Johnson Avatar answered Nov 12 '22 05:11

Bryan Johnson