Spark dataframe add a row for every existing row

Tags:

I have a dataframe with following columns:

groupid,unit,height
----------------------
1,in,55
2,in,54

I want to create another dataframe with additional rows where unit=cm and height=height*2.54.

Resulting dataframe:

groupid,unit,height
----------------------
1,in,55
2,in,54
1,cm,139.7
2,cm,137.16

Not sure how I can use spark udf and explode here. Any help is appreciated. Thanks in advance.

920

asked Jul 10 '17 03:07

dreddy

1 Answers

you can create another dataframe with changes you require using withColumn and then union both dataframes as

import sqlContext.implicits._
import org.apache.spark.sql.functions._

val df = Seq(
  (1, "in", 55),
  (2, "in", 54)
).toDF("groupid", "unit", "height")

val df2 = df.withColumn("unit", lit("cm")).withColumn("height", col("height")*2.54)

df.union(df2).show(false)

you should have

+-------+----+------+
|groupid|unit|height|
+-------+----+------+
|1      |in  |55.0  |
|2      |in  |54.0  |
|1      |cm  |139.7 |
|2      |cm  |137.16|
+-------+----+------+

111

answered Sep 28 '22 08:09

Ramesh Maharjan

Related questions
                            
                                For comprehension: how to run Futures sequentially
                            
                                What happens if an RDD can't fit into memory in Spark? [duplicate]
                            
                                Exception causes Future to never complete
                            
                                Renaming a fat jar with Maven
                            
                                How to convert a sparse vector to dense in Scala Spark?
                            
                                How does `.get("key")` on a `Option[Map[String,String]]` work
                            
                                how to obtain the trained best model from a crossvalidator
                            
                                spark group multiple rdd items by key
                            
                                no valid constructor on spark
                            
                                Transforming JSON with state in circe
                            
                                How long did it take to run an Observable using RxJava (ReactiveX)?
                            
                                What's the performance impact of converting between `DataFrame`, `RDD` and back?
                            
                                Cannot prove that Null <:< T
                            
                                Scala Either with Unit
                            
                                Why scala does not unify this type lambda with underlying type?
                            
                                scala: How to know the program have unhandled exceptions before running?
                            
                                How to open TCP connection with TLS in scala using akka
                            
                                Access names of fields in struct Spark SQL
                            
                                Spark SQL's Scala API - TimestampType - No Encoder found for org.apache.spark.sql.types.TimestampType
                            
                                PlayFramework: multiple routes file in project

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark dataframe add a row for every existing row

Tags:

scala

apache-spark

explode

apache-spark-sql

dreddy

People also ask

1 Answers

Ramesh Maharjan

Recent Activity

Donate For Us