I want to take Transpose of a Dataset in scala?
My csv file is,
a,b,c,d
e,f,g,h
i,j,k,l
m,n,o,p
I need the result as,
a,e,i,m
b,f,j,n
c,g,k,o
d,h,l,p
One liner that I think works in Spark.
val a = List(
List('a', 'b', 'c', 'd'),
List('e', 'f', 'g', 'h'),
List('i', 'j', 'k', 'l'),
List('m', 'n', 'o', 'p')
)
val b = sc.parallize(a,1)
b.flatMap(_.zipWithIndex)
.groupBy(_._2)
.mapValues(_.map(_._1))
.collectAsMap()
.toList
.sortBy(_._1)
.map(_._2)
//> List[Iterable[Char]] = List(
// List(a, e, i, m), List(b, f, j, n), List(c, g, k, o), List(d, h, l, p))
Zip each element of each list with its index, then group by that index. So we have maps 0 -> <list of (elements, index) at that index>. Convert the values to just the list of values. Then convert the result to a list (via a map with collectAsMap, as RDD doesn't have .toList), so we can sort it by index. Then sort it by index and extract (with another map) just the element values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With