How to take Transpose of a Dataset in scala?

Question

I want to take Transpose of a Dataset in scala?

My csv file is,

a,b,c,d
e,f,g,h
i,j,k,l
m,n,o,p

I need the result as,

a,e,i,m
b,f,j,n
c,g,k,o
d,h,l,p

The Archetypal Paul · Accepted Answer

One liner that I think works in Spark.

val a = List(
  List('a', 'b', 'c', 'd'),
  List('e', 'f', 'g', 'h'),
  List('i', 'j', 'k', 'l'),
  List('m', 'n', 'o', 'p')
)
val b = sc.parallize(a,1)

 b.flatMap(_.zipWithIndex)
  .groupBy(_._2)
  .mapValues(_.map(_._1))
  .collectAsMap()
  .toList
  .sortBy(_._1)
  .map(_._2)
//> List[Iterable[Char]] = List(
// List(a, e, i, m), List(b, f, j, n), List(c, g, k, o), List(d, h, l, p))

Zip each element of each list with its index, then group by that index. So we have maps 0 -> <list of (elements, index) at that index>. Convert the values to just the list of values. Then convert the result to a list (via a map with collectAsMap, as RDD doesn't have .toList), so we can sort it by index. Then sort it by index and extract (with another map) just the element values.

How to take Transpose of a Dataset in scala?

Tags:

csv

scala

rdd

rosy

1 Answers

The Archetypal Paul

Recent Activity

Donate For Us

How to take Transpose of a Dataset in scala?

Tags:

csv

scala

rdd

rosy

1 Answers

The Archetypal Paul

Related questions

Recent Activity

Donate For Us