Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change Iterable[(String, Double)] of an RDD to Array or List

I have an org.apache.spark.rdd.RDD[(String, (Double, Double), Iterable[(String, Double)])] but it seems working with the Iterable is hard. Is there any way I can change it to an Array[(String, Double)]?

like image 655
Kevin Zakka Avatar asked Aug 10 '15 17:08

Kevin Zakka


People also ask

How do you convert RDD to string in Pyspark?

Try x = all_coord_iso_rdd. take(4) . Then print(type(x)) - you'll see that it's a list (of tuples). Then just convert it to string.

Which function is used to pipe each partition of the RDD through a shell command?

This operation is also called groupWith. When called on datasets of types T and U, returns a dataset of (T, U) pairs (all pairs of elements). Pipe each partition of the RDD through a shell command, e.g. a Perl or bash script. It decreases the number of partitions in the RDD to numPartitions.


1 Answers

You can simply use Iterable.toArray

rdd.map{case (x, y, iter) => (x, y, iter.toArray)}

or Iterable.toList

rdd.map{case (x, y, iter) => (x, y, iter.toList)}
like image 66
zero323 Avatar answered Oct 31 '22 18:10

zero323