I'm trying to create "n" dataframes
based on the data of one. I'm checking the Integer value of a column
in dataframe
and looping the sql sentence for creating "n" dataframes
as many as Integers
in the column.
This is my code:
val maxvalue = spark.sql("SELECT MAX(column4) as maxval FROM mydata").collect()(0).getInt(0)
for( i <- 0 to maxvalue){
var query = "SELECT column1,colum2,colum3 FROM mydata WHERE column4 = "+ i
val newdataframe = spark.sql(query)
//add dataframe to List
}
I need to create "n" dataframes
but I don't know how to declare the List
type before loop and populate inside the for.
The existing dataframe
data type:
// +------------+------------+------------+------------+
// | column1| column2| column3| column4|
// +------------+------------+------------+------------+
// | String| Double| Int| Int|
// +------------+------------+------------+------------+
The new dataframes
data type:
// +------------+------------+------------+
// | column1| column2| column3|
// +------------+------------+------------+
// | String| Double| Int|
// +------------+------------+------------+
You can create a mutable list and populate it:
val dfs = mutable.ArrayBuffer[DataFrame]()
for( i <- 0 to maxvalue){
val query = "SELECT column1,colum2,colum3 FROM mydata WHERE column4 = "+ i
val newdataframe = spark.sql(query)
dfs += newdataframe
}
But a better approach (not using mutable data structure) is to map the list of integers into a list of DataFrames:
val dfs: Seq[DataFrame] = (0 to maxvalue).map { i =>
spark.sql("SELECT column1,colum2,colum3 FROM mydata WHERE column4 = " + i)
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With