Can any tell me how to convert Spark dataframe into Array[String] in scala.
I have used the following.
x =df.select(columns.head, columns.tail: _*).collect()
The above snippet gives me an Array[Row] and not Array[String]
This should do the trick:
df.select(columns: _*).collect.map(_.toSeq)
DataFrame to Array[String]
data.collect.map(_.toSeq).flatten
You can also use the following
data.collect.map(row=>row.getString(0))
If you have more columns then it is good to use the last one
data.rdd.map(row=>row.getString(0)).collect
If you are planning to read the dataset line by line, then you can use the iterator over the dataset:
Dataset<Row>csv=session.read().format("csv").option("sep",",").option("inferSchema",true).option("escape, "\"").option("header", true).option("multiline",true).load(users/abc/....);
for(Iterator<Row> iter = csv.toLocalIterator(); iter.hasNext();) {
String[] item = ((iter.next()).toString().split(",");
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With