Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding dataframes to List in Spark

I'm trying to create "n" dataframes based on the data of one. I'm checking the Integer value of a column in dataframe and looping the sql sentence for creating "n" dataframes as many as Integers in the column.

This is my code:

val maxvalue = spark.sql("SELECT MAX(column4) as maxval FROM mydata").collect()(0).getInt(0)
for( i <- 0 to maxvalue){
         var query = "SELECT column1,colum2,colum3 FROM mydata WHERE column4 = "+ i
         val newdataframe = spark.sql(query)
         //add dataframe to List

}

I need to create "n" dataframes but I don't know how to declare the List type before loop and populate inside the for.

The existing dataframe data type:

// +------------+------------+------------+------------+
// |     column1|     column2|     column3|     column4|
// +------------+------------+------------+------------+
// |      String|      Double|         Int|         Int|
// +------------+------------+------------+------------+

The new dataframes data type:

// +------------+------------+------------+
// |     column1|     column2|     column3|     
// +------------+------------+------------+
// |      String|      Double|         Int|
// +------------+------------+------------+
like image 237
eifersucht Avatar asked Dec 11 '22 13:12

eifersucht


1 Answers

You can create a mutable list and populate it:

val dfs = mutable.ArrayBuffer[DataFrame]()
for( i <- 0 to maxvalue){
  val query = "SELECT column1,colum2,colum3 FROM mydata WHERE column4 = "+ i
  val newdataframe = spark.sql(query)
  dfs += newdataframe
}

But a better approach (not using mutable data structure) is to map the list of integers into a list of DataFrames:

val dfs: Seq[DataFrame] = (0 to maxvalue).map { i => 
  spark.sql("SELECT column1,colum2,colum3 FROM mydata WHERE column4 = " + i)
}
like image 121
Tzach Zohar Avatar answered Dec 25 '22 00:12

Tzach Zohar