DataFrame a
= contains column x,y,z,kDataFrame b
= contains column x,y,a
a.join(b,<condition to use in java to use x,y >) ???
I tried using
a.join(b,a.col("x").equalTo(b.col("x")) && a.col("y").equalTo(b.col("y"),"inner")
But Java is throwing error saying &&
is not allowed.
Using concat() Function to Concatenate DataFrame Columns Spark SQL functions provide concat() to concatenate two or more DataFrame columns into a single Column. It can also take columns of different Data Types and concatenate them into a single column. for example, it supports String, Int, Boolean and also arrays.
You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.
You can add multiple columns to Spark DataFrame in several ways if you wanted to add a known set of columns you can easily do by chaining withColumn() or on select(). However, sometimes you may need to add multiple columns after applying some transformations n that case you can use either map() or foldLeft().
Spark filter() or where() function is used to filter the rows from DataFrame or Dataset based on the given one or multiple conditions or SQL expression. You can use where() operator instead of the filter if you are coming from SQL background. Both these functions operate exactly the same.
Spark SQL provides a group of methods on Column
marked as java_expr_ops
which are designed for Java interoperability. It includes and
(see also or
) method which can be used here:
a.col("x").equalTo(b.col("x")).and(a.col("y").equalTo(b.col("y"))
If you want to use Multiple columns for join, you can do something like this:
a.join(b,scalaSeq, joinType)
You can store your columns in Java-List and convert List to Scala seq. Conversion of Java-List to Scala-Seq:
scalaSeq = JavaConverters.asScalaIteratorConverter(list.iterator()).asScala().toSeq();
Example: a = a.join(b, scalaSeq, "inner");
Note: Dynamic columns will be easily supported in this way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With