I am using Spark version 2.1 in Databricks. I have a data frame named wamp
to which I want to add a column named region
which should take the constant value NE
. However, I get an error saying NameError: name 'lit' is not defined
when I run the following command:
wamp = wamp.withColumn('region', lit('NE'))
What am I doing wrong?
Add New Column with Constant Value In PySpark, to add a new column to DataFrame use lit() function by importing from pyspark. sql. functions import lit , lit() function takes a constant value you wanted to add and returns a Column type, if you wanted to add a NULL / None use lit(None) .
You can replace column values of PySpark DataFrame by using SQL string functions regexp_replace(), translate(), and overlay() with Python examples.
you need to import lit
either
from pyspark.sql.functions import *
will make lit
available
or something like
import pyspark.sql.functions as sf
wamp = wamp.withColumn('region', sf.lit('NE'))
muon@ provided the correct answer above. Just adding a quick reproducible version to increase clarity.
>>> from pyspark.sql.functions import lit
>>> df = spark.createDataFrame([(1, 4, 3)], ['a', 'b', 'c'])
>>> df.show()
+---+---+---+
| a| b| c|
+---+---+---+
| 1| 4| 3|
+---+---+---+
>>> df = df.withColumn("d", lit(5))
>>> df.show()
+---+---+---+---+
| a| b| c| d|
+---+---+---+---+
| 1| 4| 3| 5|
+---+---+---+---+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With