Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert spark dataframe to Array[String]

Can any tell me how to convert Spark dataframe into Array[String] in scala.

I have used the following.

x =df.select(columns.head, columns.tail: _*).collect()

The above snippet gives me an Array[Row] and not Array[String]

like image 538
Bharath Avatar asked Sep 09 '17 20:09

Bharath


3 Answers

This should do the trick:

df.select(columns: _*).collect.map(_.toSeq)
like image 59
Sohum Sachdev Avatar answered Nov 08 '22 07:11

Sohum Sachdev


DataFrame to Array[String]

data.collect.map(_.toSeq).flatten

You can also use the following

data.collect.map(row=>row.getString(0)) 

If you have more columns then it is good to use the last one

 data.rdd.map(row=>row.getString(0)).collect
like image 33
loneStar Avatar answered Nov 08 '22 09:11

loneStar


If you are planning to read the dataset line by line, then you can use the iterator over the dataset:

 Dataset<Row>csv=session.read().format("csv").option("sep",",").option("inferSchema",true).option("escape, "\"").option("header", true).option("multiline",true).load(users/abc/....);

for(Iterator<Row> iter = csv.toLocalIterator(); iter.hasNext();) {
    String[] item = ((iter.next()).toString().split(",");    
}
like image 1
2 revs Avatar answered Nov 08 '22 09:11

2 revs