Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scala - Spark : How to union all dataframe in loop

Is there a way to get the dataframe that union dataframe in loop?

This is a sample code:

var fruits = List(
  "apple"
  ,"orange"
  ,"melon"
) 

for (x <- fruits){         
  var df = Seq(("aaa","bbb",x)).toDF("aCol","bCol","name")
}

I would want to obtain some like this:

aCol | bCol | fruitsName
aaa,bbb,apple
aaa,bbb,orange
aaa,bbb,melon

Thanks again

like image 363
J.soo Avatar asked Apr 19 '17 07:04

J.soo


People also ask

How do you Union a Dataframe in PySpark in a for loop?

If you add "ID" into your window w as another partitionBy argument, you do not need to do the for loop and union at all. Just subset the dataframe into the ids you want test_df = test_df. where(col("ID"). isin(series_list)) and you are good to go.

What is difference between union and union all in Spark?

UNION and UNION ALL return the rows that are found in either relation. UNION (alternatively, UNION DISTINCT ) takes only distinct rows while UNION ALL does not remove duplicates from the result rows.


1 Answers

You could created a sequence of DataFrames and then use reduce:

val results = fruits.
  map(fruit => Seq(("aaa", "bbb", fruit)).toDF("aCol","bCol","name")).
  reduce(_.union(_))

results.show()
like image 74
Ramon Avatar answered Nov 09 '22 03:11

Ramon