Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PySpark 2.4: TypeError: Column is not iterable (with F.col() usage)

The following gives me a TypeError: Column is not iterable exception:

from pyspark.sql import functions as F

df = spark_sesn.createDataFrame([Row(col0 = 10,
                                     col2 = 'ten',
                                     col3 = 10.0),])

df.withColumn('key',        F.lit('1')) # This succeeds.
df.withColumn(F.col('key'), F.lit('1')) # This causes an exception. <---- TypeError

You might be wondering why I want to use the second variation at all. It's because I need to access the .alias() method to add metadata to that column, like so:

df.withColumn(F.col('key').alias('key', metadata={'foo':'bar'}), F.lit('1'))

How do we get the second variation to work and/or insert the metadata needed? Keep in mind that the real DataFrame already exists (meaning, I can't create one from scratch like I did in this simple example).

Thank you! =:)

like image 597
NYCeyes Avatar asked Dec 29 '25 04:12

NYCeyes


1 Answers

withColumn requires the first parameter to be a string, so don't think the second option can work; You might just use select to add a new column with alias:

df.select("*", F.lit(1).alias("key", metadata={"foo": "bar"})).show()
+----+----+----+---+
|col0|col2|col3|key|
+----+----+----+---+
|  10| ten|10.0|  1|
+----+----+----+---+

Or you can use alias on F.lit when using withColumn:

df.withColumn("key", F.lit(1).alias(None, metadata={"foo": "bar"})).show()
+----+----+----+---+
|col0|col2|col3|key|
+----+----+----+---+
|  10| ten|10.0|  1|
+----+----+----+---+
like image 179
Psidom Avatar answered Dec 31 '25 19:12

Psidom