Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark convert single column into array

How can I convert a single column in spark 2.0.1 into an array?

+---+-----+
| id| dist| 
+---+-----+
|1.0|2.0|
|2.0|4.0|
|3.0|6.0|
|4.0|8.0|
+---+-----+

should return Array(1.0, 2.0, 3.0, 4.0)

A

import scala.collection.JavaConverters._ 
df.select("id").collectAsList.asScala.toArray

fails with

java.lang.RuntimeException: Unsupported array type: [Lorg.apache.spark.sql.Row;
java.lang.RuntimeException: Unsupported array type: [Lorg.apache.spark.sql.Row;
like image 726
Georg Heiler Avatar asked Jan 05 '23 02:01

Georg Heiler


1 Answers

Why do you use JavaConverters if you then re-transform the Java List to a Scala List ? You just need to collect the dataset and then map this array of Rows to an array of doubles, like this :

df.select("id").collect.map(_.getDouble(0))
like image 133
cheseaux Avatar answered Jan 10 '23 15:01

cheseaux