I have a pyspark DataFrame like:
+------------------------+ | ids| +------------------------+ |[101826, 101827, 101576]| +------------------------+
and I want explode this dataframe like:
+------------------------+ | id| ids| +------------------------+ |101826 |[101827, 101576]| |101827 |[101826, 101576]| |101576 |[101826, 101827]| +------------------------+
How can I do using pyspark udf or other methods?
The easiest way out is to copy id into ids. Explode id and use array except to exclude each id in the row. Code below.
 (
  df1.withColumn('ids', col('id'))
 .withColumn('id',explode('id'))
    .withColumn('ids',array_except(col('ids'), array('id')))
).show(truncate=False)
+------+----------------+
|id    |ids             |
+------+----------------+
|101826|[101827, 101576]|
|101827|[101826, 101576]|
|101576|[101826, 101827]|
+------+----------------+
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With