I have an org.apache.spark.rdd.RDD[(String, (Double, Double), Iterable[(String, Double)])]
but it seems working with the Iterable
is hard. Is there any way I can change it to an Array[(String, Double)]
?
Try x = all_coord_iso_rdd. take(4) . Then print(type(x)) - you'll see that it's a list (of tuples). Then just convert it to string.
This operation is also called groupWith. When called on datasets of types T and U, returns a dataset of (T, U) pairs (all pairs of elements). Pipe each partition of the RDD through a shell command, e.g. a Perl or bash script. It decreases the number of partitions in the RDD to numPartitions.
You can simply use Iterable.toArray
rdd.map{case (x, y, iter) => (x, y, iter.toArray)}
or Iterable.toList
rdd.map{case (x, y, iter) => (x, y, iter.toList)}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With