I am trying to define functions in Scala that take a list of strings as input, and converts them into the columns passed to the dataframe array arguments used in the code below.
val df = sc.parallelize(Array((1,1),(2,2),(3,3))).toDF("foo","bar")
val df2 = df
.withColumn("columnArray",array(df("foo").cast("String"),df("bar").cast("String")))
.withColumn("litArray",array(lit("foo"),lit("bar")))
More specifically, I would like to create functions colFunction
and litFunction
(or just one function if possible) that takes a list of strings as an input parameter and can be used as follows:
val df = sc.parallelize(Array((1,1),(2,2),(3,3))).toDF("foo","bar")
val colString = List("foo","bar")
val df2 = df
.withColumn("columnArray",array(colFunction(colString))
.withColumn("litArray",array(litFunction(colString)))
I have tried mapping the colString
to an Array of columns with all the transformations but this doesn't work. Any ideas on how this can be achieved? Many thanks for reading the question, and for any suggestions/solutions.
Spark 2.2+:
Support for Seq
, Map
and Tuple
(struct
) literals has been added in SPARK-19254. According to tests:
import org.apache.spark.sql.functions.typedLit
typedLit(Seq("foo", "bar"))
Spark < 2.2
Just map
with lit
and wrap with array
:
def asLitArray[T](xs: Seq[T]) = array(xs map lit: _*)
df.withColumn("an_array", asLitArray(colString)).show
// +---+---+----------+
// |foo|bar| an_array|
// +---+---+----------+
// | 1| 1|[foo, bar]|
// | 2| 2|[foo, bar]|
// | 3| 3|[foo, bar]|
// +---+---+----------+
Regarding transformation from Seq[String]
to Column
of type Array
this functionality is already provided by:
def array(colName: String, colNames: String*): Column
or
def array(cols: Column*): Column
Example:
val cols = Seq("bar", "foo")
cols match { case x::xs => df.select(array(x, xs:_*))
// or
df.select(array(cols map col: _*))
Of course all columns have to be of the same type.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With