Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replicate Spark Row N-times

I want to duplicate a Row in a DataFrame, how can I do that?

For example, I have a DataFrame consisting of 1 Row, and I want to make a DataFrame with 100 identical Rows. I came up with the following solution:

  var data:DataFrame=singleRowDF

   for(i<-1 to 100-1) {
       data = data.unionAll(singleRowDF)
   }

But this introduces many transformations and it seems my subsequent actions become very slow. Is there another way to do it?

like image 820
Raphael Roth Avatar asked Nov 03 '16 09:11

Raphael Roth


1 Answers

You can add a column with a literal value of an Array with size 100, and then use explode to make each of its elements create its own row; Then, just get rid of this "dummy" column:

import org.apache.spark.sql.functions._

val result = singleRowDF
  .withColumn("dummy", explode(array((1 until 100).map(lit): _*)))
  .selectExpr(singleRowDF.columns: _*)
like image 158
Tzach Zohar Avatar answered Sep 25 '22 15:09

Tzach Zohar