I have a df
with the following schema:
root
|-- col1: string (nullable = true)
|-- col2: array (nullable = true)
| |-- element: string (containsNull = true)
in which one of the columns, col2
is an array [1#b, 2#b, 3#c]
. I want to convert this to the string format 1#b,2#b,3#c
.
I am currently doing this through the following snippet
df2 = (df1.select("*", explode(col2)).drop('col2'))
df2.groupBy("col1").agg(concat_ws(",", collect_list('col')).alias("col2"))
While this gets the job done, it is taking time and also seems inefficient.
Is there a better alternative?
You can call concat_ws
directly on a column, like this:
df1.withColumn('col2', concat_ws(',', 'col2'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With