Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala: How to combine two data frames?

First Df is:

ID Name ID2 Marks
1   12    1   333

Second Df2 is:

ID Name ID2 Marks
1         3   989
7   98    8   878

I need output is:

ID Name ID2 Marks
1   12    1   333
1         3   989
7   98    8   878

Kindly help!

like image 703
Ravikumar Reddy Yeruva Avatar asked Mar 01 '18 05:03

Ravikumar Reddy Yeruva


People also ask

How do I append one DataFrame to another in Spark Scala?

Here we create an empty DataFrame where data is to be added, then we convert the data to be added into a Spark DataFrame using createDataFrame() and further convert both DataFrames to a Pandas DataFrame using toPandas() and use the append() function to add the non-empty data frame to the empty DataFrame and ignore the ...

How do you Union multiple data frames?

The PySpark union() function is used to combine two or more data frames having the same structure or schema. This function returns an error if the schema of data frames differs from each other. Where, data_frame1 and data_frame2 are the dataframes.

How do I join multiple tables in Scala?

If they are from three different tabels, I would use push down filters to filter them on server and use join between data frame join function to join them together. If they are not from database tables; you can use filter and map high order function to the same parallel.


1 Answers

Use union or unionAll function:

df1.unionAll(df2)
df1.union(df2)

for example:

scala> val a = (1,"12",1,333)
a: (Int, String, Int, Int) = (1,12,1,333)

scala> val b = (1,"",3,989)
b: (Int, String, Int, Int) = (1,"",3,989)

scala> val c = (7,"98",8,878)
c: (Int, String, Int, Int) = (7,98,8,878)

scala> import spark.implicits._
import spark.implicits._

scala> val df1 = List(a).toDF("ID","Name","ID2","Marks")
df1: org.apache.spark.sql.DataFrame = [ID: int, Name: string ... 2 more fields]

scala> val df2 = List(b, c).toDF("ID","Name","ID2","Marks")
df2: org.apache.spark.sql.DataFrame = [ID: int, Name: string ... 2 more fields]

scala> df1.show
+---+----+---+-----+
| ID|Name|ID2|Marks|
+---+----+---+-----+
|  1|  12|  1|  333|
+---+----+---+-----+


scala> df2.show
+---+----+---+-----+
| ID|Name|ID2|Marks|
+---+----+---+-----+
|  1|    |  3|  989|
|  7|  98|  8|  878|
+---+----+---+-----+


scala> df1.union(df2).show
+---+----+---+-----+
| ID|Name|ID2|Marks|
+---+----+---+-----+
|  1|  12|  1|  333|
|  1|    |  3|  989|
|  7|  98|  8|  878|
+---+----+---+-----+
like image 125
Pavithran Ramachandran Avatar answered Sep 23 '22 08:09

Pavithran Ramachandran