The following gives me a TypeError: Column is not iterable exception:
from pyspark.sql import functions as F
df = spark_sesn.createDataFrame([Row(col0 = 10,
col2 = 'ten',
col3 = 10.0),])
df.withColumn('key', F.lit('1')) # This succeeds.
df.withColumn(F.col('key'), F.lit('1')) # This causes an exception. <---- TypeError
You might be wondering why I want to use the second variation at all. It's because I need to access the .alias() method to add metadata to that column, like so:
df.withColumn(F.col('key').alias('key', metadata={'foo':'bar'}), F.lit('1'))
How do we get the second variation to work and/or insert the metadata needed? Keep in mind that the real DataFrame already exists (meaning, I can't create one from scratch like I did in this simple example).
Thank you! =:)
withColumn requires the first parameter to be a string, so don't think the second option can work; You might just use select to add a new column with alias:
df.select("*", F.lit(1).alias("key", metadata={"foo": "bar"})).show()
+----+----+----+---+
|col0|col2|col3|key|
+----+----+----+---+
| 10| ten|10.0| 1|
+----+----+----+---+
Or you can use alias on F.lit when using withColumn:
df.withColumn("key", F.lit(1).alias(None, metadata={"foo": "bar"})).show()
+----+----+----+---+
|col0|col2|col3|key|
+----+----+----+---+
| 10| ten|10.0| 1|
+----+----+----+---+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With