Replicate Spark Row N-times

Question

I want to duplicate a Row in a DataFrame, how can I do that?

For example, I have a DataFrame consisting of 1 Row, and I want to make a DataFrame with 100 identical Rows. I came up with the following solution:

  var data:DataFrame=singleRowDF

   for(i<-1 to 100-1) {
       data = data.unionAll(singleRowDF)
   }

But this introduces many transformations and it seems my subsequent actions become very slow. Is there another way to do it?

Tzach Zohar · Accepted Answer

You can add a column with a literal value of an Array with size 100, and then use explode to make each of its elements create its own row; Then, just get rid of this "dummy" column:

import org.apache.spark.sql.functions._

val result = singleRowDF
  .withColumn("dummy", explode(array((1 until 100).map(lit): _*)))
  .selectExpr(singleRowDF.columns: _*)

Replicate Spark Row N-times

Tags:

scala

apache-spark

Raphael Roth

1 Answers

Tzach Zohar

Recent Activity

Donate For Us

Replicate Spark Row N-times

Tags:

scala

apache-spark

Raphael Roth

1 Answers

Tzach Zohar

Related questions

Recent Activity

Donate For Us