I have two DataFrames in Spark SQL (D1 and D2). I am trying to inner join both of them <code>D1.join(D2, "some column")</code> and get back data of only D1, not the complete data set. Both D1 and D2 are having the same columns. Could some one please help me on this? I am using Spark 1.6.

As an alternate answer, you could also do the following without adding aliases: <pre class="prettyprint"><code>d1.join(d2, d1("id") === d2("id")) .select(d1.columns.map(c => d1(c)): _*) </code></pre>

You could use <code>left_semi</code>: <pre class="prettyprint"><code>d1.as("d1").join(d2.as("d2"), $"d1.id" === $"d2.id", "left_semi") </code></pre> Semi-join takes only rows from the left dataset where joining condition is met. There's also another interesting join type: <code>left_anti</code>, which works similarily to <code>left_semi</code> but takes only those rows where the condition is not met.

Joining two DataFrames in Spark SQL and selecting columns of only one

3 Answers

Let say you want to join on "id" column. Then you could write :

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._    
d1.as("d1").join(d2.as("d2"), $"d1.id" === $"d2.id").select($"d1.*")

200

answered Oct 05 '22 15:10

cheseaux

As an alternate answer, you could also do the following without adding aliases:

d1.join(d2, d1("id") === d2("id"))
  .select(d1.columns.map(c => d1(c)): _*)

answered Oct 05 '22 16:10

nsanglar

You could use left_semi:

d1.as("d1").join(d2.as("d2"), $"d1.id" === $"d2.id", "left_semi")

Semi-join takes only rows from the left dataset where joining condition is met.

There's also another interesting join type: left_anti, which works similarily to left_semi but takes only those rows where the condition is not met.

answered Oct 05 '22 15:10

Krzysztof Atłasik

Related questions
                            
                                Many-value map in Scala
                            
                                In Scala, how to find an elemein in CSV by a pair of key values?
                            
                                Scala's blend of functional with object-oriented
                            
                                Do Scala 2.8.1 and SBT work ok with JDK 7?
                            
                                Using LiftScreen field or text
                            
                                Implicit parameter not found on function application
                            
                                Scala parameterized type problem with returning an instance of the same type
                            
                                Scala compile-time recursion?
                            
                                scala ArrayBuffer remove all elements with a predicat
                            
                                Importing a Scala project from github into Eclipse
                            
                                Problems for which scala/akka or go are a better fit
                            
                                Passing JVM args to SBT
                            
                                Concise way to create day of week enumeration in Scala
                            
                                CouchDB or MongoDB for Lift Web application?
                            
                                How to use a Scala Secure Trait in PlayFramework?
                            
                                Akka Streams: State in a flow
                            
                                How do I find the absolute path to a Play Framework app?
                            
                                How can I match classes in a Scala "match" statement?
                            
                                Making sense of Scala FP Libraries
                            
                                Creating a flow from actor in Akka Streams

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Joining two DataFrames in Spark SQL and selecting columns of only one

Tags:

scala

apache-spark

apache-spark-sql

Avi

People also ask

3 Answers

cheseaux

nsanglar

Krzysztof Atłasik

Recent Activity

Donate For Us