Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark collect as Array[T] and not as Array[Row] from data frame

I can collect a column like this using the RDD API.

df.map(r => r.getAs[String]("column")).collect

However, as I am initially using a Dataset I rather would like to not switch the API level. A simple df.select("column).collect returns an Array[Row] where the .flatten operator no longer works. How can I collect to Array[T e.g. String] directly?

like image 239
Georg Heiler Avatar asked Dec 10 '22 14:12

Georg Heiler


1 Answers

With Datasets ( Spark version >= 2.0.0 ), you just need to convert the dataframe to dataset and then collect it.

df.select("column").as[String].collect()

would return you an Array[String]

like image 109
Lokesh Yadav Avatar answered May 14 '23 15:05

Lokesh Yadav