Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to transform Dataset[(String, Seq[String])] to Dataset[(String, String)]?


Probably this's simple problem, but I begin my adventure with spark.

Problem: I'd like to get following structure (Expected result) in spark. Now I have following structure.

title1, {word11, word12, word13 ...}
title2, {word12, word22, word23 ...}

Data are stored in Dataset[(String, Seq[String])]

Excepted result I would like to get Tuple [word, title]

word11, {title1}
word12, {title1}

What I do
1. Make (title, seq[word1,word2,word,3])

docs.mapPartitions { iter =>
  iter.map {
     case (title, contents) => {
        val textToLemmas: Seq[String] = toText(....)
        (title, textToLemmas)
     }
  }
}
  1. I tried use .map to transform my structure to Tuple, but can't do it.
  2. I tried to iterate through all the elements, but then I can not return type

Thanks for answer.

like image 299
meernet Avatar asked Jun 25 '26 17:06

meernet


2 Answers

This should work:

val result = dataSet.flatMap { case (title, words) => words.map((_, title)) }
like image 131
Yuval Itzchakov Avatar answered Jun 27 '26 10:06

Yuval Itzchakov


Another solution is to call the explode function like this :

import org.apache.spark.sql.functions.explode
dataset.withColumn("_2", explode("_2")).as[(String, String)]

Hope this help you, Best Regrads.

like image 26
Haroun Mohammedi Avatar answered Jun 27 '26 10:06

Haroun Mohammedi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!