Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding constant value column to spark dataframe

I am using Spark version 2.1 in Databricks. I have a data frame named wamp to which I want to add a column named region which should take the constant value NE. However, I get an error saying NameError: name 'lit' is not defined when I run the following command:

wamp = wamp.withColumn('region', lit('NE'))

What am I doing wrong?

like image 858
Gaurav Bansal Avatar asked May 17 '17 19:05

Gaurav Bansal


People also ask

How do I add a constant column in Spark DataFrame?

Add New Column with Constant Value In PySpark, to add a new column to DataFrame use lit() function by importing from pyspark. sql. functions import lit , lit() function takes a constant value you wanted to add and returns a Column type, if you wanted to add a NULL / None use lit(None) .

How do I change the DataFrame column value in Spark?

You can replace column values of PySpark DataFrame by using SQL string functions regexp_replace(), translate(), and overlay() with Python examples.


2 Answers

you need to import lit

either

from pyspark.sql.functions import *

will make lit available

or something like

import pyspark.sql.functions as sf
wamp = wamp.withColumn('region', sf.lit('NE'))
like image 154
muon Avatar answered Oct 02 '22 20:10

muon


muon@ provided the correct answer above. Just adding a quick reproducible version to increase clarity.

>>> from pyspark.sql.functions import lit
>>> df = spark.createDataFrame([(1, 4, 3)], ['a', 'b', 'c'])
>>> df.show()
+---+---+---+
|  a|  b|  c|
+---+---+---+
|  1|  4|  3|
+---+---+---+

>>> df = df.withColumn("d", lit(5))
>>> df.show()
+---+---+---+---+
|  a|  b|  c|  d|
+---+---+---+---+
|  1|  4|  3|  5|
+---+---+---+---+
like image 34
Joarder Kamal Avatar answered Oct 02 '22 18:10

Joarder Kamal