I can collect a column like this using the RDD API.
df.map(r => r.getAs[String]("column")).collect
However, as I am initially using a Dataset I rather would like to not switch the API level. A simple df.select("column).collect
returns an Array[Row]
where the .flatten
operator no longer works.
How can I collect to Array[T e.g. String]
directly?
With Datasets ( Spark version >= 2.0.0 ), you just need to convert the dataframe to dataset and then collect it.
df.select("column").as[String].collect()
would return you an Array[String]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With