How to append an element to an array column of a Spark Dataframe?

Tags:

apache-spark

Suppose I have the following DataFrame:

scala> val df1 = Seq("a", "b").toDF("id").withColumn("nums", array(lit(1)))
df1: org.apache.spark.sql.DataFrame = [id: string, nums: array<int>]

scala> df1.show()
+---+----+
| id|nums|
+---+----+
|  a| [1]|
|  b| [1]|
+---+----+

And I want to add elements to the array in the nums column, so that I get something like the following:

Click to copy

+---+-------+
| id|nums   |
+---+-------+
|  a| [1,5] |
|  b| [1,5] |
+---+-------+

Is there a way to do this using the .withColumn() method of the DataFrame? E.g.

Click to copy

val df2 = df1.withColumn("nums", append(col("nums"), lit(5)))

I've looked through the API documentation for Spark, but can't find anything that would allow me to do this. I could probably use split and concat_ws to hack something together, but I would prefer a more elegant solution if one is possible. Thanks.

403

asked Apr 06 '18 04:04

Shafique Jamal

1 Answers

Click to copy

import org.apache.spark.sql.functions.{lit, array, array_union}

val df1 = Seq("a", "b").toDF("id").withColumn("nums", array(lit(1)))
val df2 = df1.withColumn("nums", array_union($"nums", lit(Array(5))))
df2.show

+---+------+
| id|  nums|
+---+------+
|  a|[1, 5]|
|  b|[1, 5]|
+---+------+

The array_union() was added since spark 2.4.0 release on 11/2/2018, 7 months after you asked the question, :) see https://spark.apache.org/news/index.html

answered Sep 27 '22 23:09

Dorren Chen

Related questions
                            
                                Is there a way to include math formulae in Scaladoc?
                            
                                Value classes introduce unwanted public methods
                            
                                Compose partial functions
                            
                                How to save models from ML Pipeline to S3 or HDFS?
                            
                                Converting Typesafe Config type to java.util.Properties
                            
                                Sequencing and overriding tasks in SBT
                            
                                How to encode/decode Timestamp for json in circe?
                            
                                create empty array-column of given schema in Spark
                            
                                How to write eclipse rcp applications with scala?
                            
                                Why can't I assign to var in Scala subclass?
                            
                                How can I combine the typeclass pattern with subtyping?
                            
                                Is Either the equivalent to checked exceptions?
                            
                                SBT integration test setup
                            
                                Is using Try[Unit] the proper way?
                            
                                Spark : check your cluster UI to ensure that workers are registered
                            
                                Spark Task not serializable with lag Window function
                            
                                Transform all keys from `underscore` to `camel case` of json objects in circe
                            
                                Spark and Java: Exception thrown in awaitResult
                            
                                Apache Spark Dataframe Groupby agg() for multiple columns
                            
                                Akka/Scala: mapping Future vs pipeTo

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to append an element to an array column of a Spark Dataframe?

Tags:

scala

apache-spark

Shafique Jamal

People also ask

1 Answers

Dorren Chen

Recent Activity

Donate For Us