I have a dataframe with two columns with data as below <pre class="prettyprint"><code>+----+-----------------+ |acct| device| +----+-----------------+ | B| List(3, 4)| | C| List(3, 5)| | A| List(2, 6)| | B|List(3, 11, 4, 9)| | C| List(5, 6)| | A|List(2, 10, 7, 6)| +----+-----------------+ </code></pre> And I need the result as below <pre class="prettyprint"><code>+----+-----------------+ |acct| device| +----+-----------------+ | B|List(3, 4, 11, 9)| | C| List(3, 5, 6)| | A|List(2, 6, 7, 10)| +----+-----------------+ </code></pre> I tried as below but ,it seems to be not working <code>df.groupBy("acct").agg(concat("device"))</code> <code>df.groupBy("acct").agg(collect_set("device"))</code> Please let me know how can I achieve this using Scala?

You can start by exploding the <code>device</code> column and continue as you did - but note that it might not preserve the order of the lists (which anyway isn't guaranteed in any group by): <pre class="prettyprint"><code>val result = df.withColumn("device", explode($"device")) .groupBy("acct") .agg(collect_set("device")) result.show(truncate = false) // +----+-------------------+ // |acct|collect_set(device)| // +----+-------------------+ // |B |[9, 3, 4, 11] | // |C |[5, 6, 3] | // |A |[2, 6, 10, 7] | // +----+-------------------+ </code></pre>

How do I groupby and concat a list in a Dataframe Spark Scala

Tags:

dataframe

scala

apache-spark

apache-spark-sql

I have a dataframe with two columns with data as below

Click to copy

+----+-----------------+
|acct|           device|
+----+-----------------+
|   B|       List(3, 4)|
|   C|       List(3, 5)|
|   A|       List(2, 6)|
|   B|List(3, 11, 4, 9)|
|   C|       List(5, 6)|
|   A|List(2, 10, 7, 6)|
+----+-----------------+

And I need the result as below

Click to copy

+----+-----------------+
|acct|           device|
+----+-----------------+
|   B|List(3, 4, 11, 9)|
|   C|    List(3, 5, 6)|
|   A|List(2, 6, 7, 10)|
+----+-----------------+

I tried as below but ,it seems to be not working

df.groupBy("acct").agg(concat("device"))

df.groupBy("acct").agg(collect_set("device"))

Please let me know how can I achieve this using Scala?

465

asked May 08 '18 19:05

Babu

1 Answers

You can start by exploding the device column and continue as you did - but note that it might not preserve the order of the lists (which anyway isn't guaranteed in any group by):

Click to copy

val result = df.withColumn("device", explode($"device"))
  .groupBy("acct")
  .agg(collect_set("device"))

result.show(truncate = false)
// +----+-------------------+
// |acct|collect_set(device)|
// +----+-------------------+
// |B   |[9, 3, 4, 11]      |
// |C   |[5, 6, 3]          |
// |A   |[2, 6, 10, 7]      |
// +----+-------------------+

164

answered Sep 22 '22 17:09

Tzach Zohar

Related questions
                            
                                How can I write and read an empty case class with play-json?
                            
                                How to map struct in DataFrame to case class?
                            
                                How to use spark quantilediscretizer on multiple columns
                            
                                Why do I need to use andThen in order to pattern match Futures?
                            
                                Unbounded table is spark structured streaming
                            
                                Scala - How to split the probability column (column of vectors) that we obtain when we fit the GMM model to the data in to two separate columns? [duplicate]
                            
                                SBT: Cross build project for two Scala versions with different dependencies
                            
                                Can Scala classes be used in Java
                            
                                Why Is Functor a Higher-Kinded type
                            
                                Streaming data store in hive using spark
                            
                                Scala: Function0 vs by-name parameters
                            
                                reuse the result of a select expression in the "GROUP BY" clause?
                            
                                How to get system IP address using in scala code?
                            
                                Is it possible to print definition of a function in Scala
                            
                                cats' NonEmptyList vs scala stdlib ::
                            
                                Scala Stream tail laziness and synchronization
                            
                                How to calculate the power of 2 for the column of DataFrame
                            
                                What does it mean to inline a constant?
                            
                                Error:scalac: 'jvm-1.9' is not a valid choice for '-target'
                            
                                value toDF is not a member of Seq[(Int,String)]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I groupby and concat a list in a Dataframe Spark Scala

Tags:

dataframe

scala

apache-spark

apache-spark-sql

Babu

People also ask

1 Answers

Tzach Zohar

Recent Activity

Donate For Us