I have nested string like as shown below. I want to flat map them to produce unique rows in Spark
My dataframe has
A,B,"x,y,z",D
I want to convert it to produce output like
A,B,x,D
A,B,y,D
A,B,z,D
How can I do that.
Basically how can i do flat map and apply any function inside the Dataframe
Thanks
In the Spark SQL, flatten function is a built-in function that is defined as a function to convert an Array of the Array column (nested array) that is ArrayanyType(ArrayanyType(StringType)) into the single array column on the Spark DataFrame. The Spark SQL is defined as the Spark module for structured data processing.
flatMap() on Spark DataFrame operates similar to RDD, when applied it executes the function specified on every element of the DataFrame by splitting or merging the elements hence, the result count of the flapMap() can be different. This yields below output after flatMap() transformation.
A flatMap is a transformation operation. It applies to each element of RDD and it returns the result as new RDD. It is similar to Map, but FlatMap allows returning 0, 1 or more elements from map function.
Spark 2.0+
Dataset.flatMap
:
val ds = df.as[(String, String, String, String)]
ds.flatMap {
case (x1, x2, x3, x4) => x3.split(",").map((x1, x2, _, x4))
}.toDF
Spark 1.3+.
Use split
and explode
functions:
val df = Seq(("A", "B", "x,y,z", "D")).toDF("x1", "x2", "x3", "x4")
df.withColumn("x3", explode(split($"x3", ",")))
Spark 1.x
DataFrame.explode
(deprecated in Spark 2.x)
df.explode($"x3")(_.getAs[String](0).split(",").map(Tuple1(_)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With