I have a <code>DataFrame</code> with 3 columns i.e. <code>Id, First Name, Last Name</code> I want to apply <code>GroupBy</code> on the basis of <code>Id</code> and want to collect <code>First Name, Last Name</code> column as list. Example :- I have a DF like this <pre class="prettyprint"><code>+---+-------+--------+ |id |fName |lName | +---+-------+--------+ |1 |Akash |Sethi | |2 |Kunal |Kapoor | |3 |Rishabh|Verma | |2 |Sonu |Mehrotra| +---+-------+--------+ </code></pre> and I want my output like this <pre class="prettyprint"><code>+---+-------+--------+--------------------+ |id |fname |lName | +---+-------+--------+--------------------+ |1 |[Akash] |[Sethi] | |2 |[Kunal, Sonu] |[Kapoor, Mehrotra] | |3 |[Rishabh] |[Verma] | +---+-------+--------+--------------------+ </code></pre> Thanks in Advance

You can aggregate multiple columns like this: <pre class="prettyprint"><code>df.groupBy("id").agg(collect_list("fName"), collect_list("lName")) </code></pre> It will give you the expected result.

Apache Spark Dataframe Groupby agg() for multiple columns

Tags:

scala

apache-spark

spark-dataframe

I have a DataFrame with 3 columns i.e. Id, First Name, Last Name

I want to apply GroupBy on the basis of Id and want to collect First Name, Last Name column as list.

Example :- I have a DF like this

+---+-------+--------+
|id |fName  |lName   |
+---+-------+--------+
|1  |Akash  |Sethi   |
|2  |Kunal  |Kapoor  |
|3  |Rishabh|Verma   |
|2  |Sonu   |Mehrotra|
+---+-------+--------+

and I want my output like this

+---+-------+--------+--------------------+
|id |fname           |lName               |
+---+-------+--------+--------------------+
|1  |[Akash]         |[Sethi]             |
|2  |[Kunal, Sonu]   |[Kapoor, Mehrotra]  |
|3  |[Rishabh]       |[Verma]             |
+---+-------+--------+--------------------+

Thanks in Advance

660

asked Mar 17 '17 06:03

Akash Sethi

1 Answers

You can aggregate multiple columns like this:

df.groupBy("id").agg(collect_list("fName"), collect_list("lName"))

It will give you the expected result.

163

answered Oct 16 '22 15:10

himanshuIIITian

Related questions
                            
                                wrong top statement declaration in scala IntelliJ
                            
                                SLICK How to define bidirectional one-to-many relationship for use in case class
                            
                                Is there a way to include math formulae in Scaladoc?
                            
                                Value classes introduce unwanted public methods
                            
                                Compose partial functions
                            
                                How to save models from ML Pipeline to S3 or HDFS?
                            
                                Converting Typesafe Config type to java.util.Properties
                            
                                Sequencing and overriding tasks in SBT
                            
                                How to encode/decode Timestamp for json in circe?
                            
                                create empty array-column of given schema in Spark
                            
                                How to write eclipse rcp applications with scala?
                            
                                Why can't I assign to var in Scala subclass?
                            
                                How can I combine the typeclass pattern with subtyping?
                            
                                Is Either the equivalent to checked exceptions?
                            
                                SBT integration test setup
                            
                                Is using Try[Unit] the proper way?
                            
                                Spark : check your cluster UI to ensure that workers are registered
                            
                                Spark Task not serializable with lag Window function
                            
                                Transform all keys from `underscore` to `camel case` of json objects in circe
                            
                                Spark and Java: Exception thrown in awaitResult

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With