Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Joining two dataframes in Spark

When I'm trying to join two data frames using

DataFrame joindf = dataFrame.join(df, df.col(joinCol)); //.equalTo(dataFrame.col(joinCol)));

My program is throwing below exception

org.apache.spark.sql.AnalysisException: join condition 'url' of type string is not a boolean.;

Here joinCol value is url Need inputs as what could possibly cause these exceptions

like image 332
bigdata123 Avatar asked May 21 '26 00:05

bigdata123


2 Answers

join variants which take as a second argument Column expect that it can be evaluated as a boolean expression.

If you want a simple equi-join based on a column name use a version which takes a column name as a String:

String joinCol = "foo";
dataFrame.join(df, joinCol);
like image 85
zero323 Avatar answered May 22 '26 18:05

zero323


What that means is that the join condition should evaluate to an expression. Lets say we want to join 2 dataframes based on id, so what we can do is :

With Python:

df1.join(df2, df['id'] == df['id'], 'left')  # 3rd parameter is type of join which in this case is left join


With Scala:

df1.join(df2, df('id') === df('id'))    // create inner join based on id column
like image 40
Abdul Mannan Avatar answered May 22 '26 20:05

Abdul Mannan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!