Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to take Transpose of a Dataset in scala?

Tags:

csv

scala

rdd

I want to take Transpose of a Dataset in scala?

My csv file is,

a,b,c,d
e,f,g,h
i,j,k,l
m,n,o,p

I need the result as,

a,e,i,m
b,f,j,n
c,g,k,o
d,h,l,p
like image 414
rosy Avatar asked Dec 03 '25 14:12

rosy


1 Answers

One liner that I think works in Spark.

val a = List(
  List('a', 'b', 'c', 'd'),
  List('e', 'f', 'g', 'h'),
  List('i', 'j', 'k', 'l'),
  List('m', 'n', 'o', 'p')
)
val b = sc.parallize(a,1)

 b.flatMap(_.zipWithIndex)
  .groupBy(_._2)
  .mapValues(_.map(_._1))
  .collectAsMap()
  .toList
  .sortBy(_._1)
  .map(_._2)
//> List[Iterable[Char]] = List(
// List(a, e, i, m), List(b, f, j, n), List(c, g, k, o), List(d, h, l, p))

Zip each element of each list with its index, then group by that index. So we have maps 0 -> <list of (elements, index) at that index>. Convert the values to just the list of values. Then convert the result to a list (via a map with collectAsMap, as RDD doesn't have .toList), so we can sort it by index. Then sort it by index and extract (with another map) just the element values.

like image 108
The Archetypal Paul Avatar answered Dec 05 '25 05:12

The Archetypal Paul



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!