The line:
df.withColumn("test", expr("concat(lon, lat)"))
works as expected but
df.withColumn("test", expr("concat(lon, lit(','), lat)"))
produces the following exception:
org.apache.spark.sql.AnalysisException: Undefined function: 'lit'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 12 at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$49.apply(Analyzer.scala:1198)
Why? And what would be the workaround?
'Lit' has been a slang term meaning "intoxicated" for over a century. More recently, it has acquired the meaning "exciting," as well as a broader meaning along the lines of "excellent."
lit. Creates a Column of literal value.
PySpark expr() is a SQL function to execute SQL-like expressions and to use an existing DataFrame column value as an expression argument to Pyspark built-in functions. Most of the commonly used SQL functions are either part of the PySpark Column class or built-in pyspark.
The string argument to expr
will be parsed as a SQL expression and used to construct a column. Since lit
is not a valid SQL command this will give you an error. (lit
is used in Spark to convert a literal value into a new column.)
To solve this, simply remove the lit
part:
df.withColumn("test", expr("concat(lon, ',', lat)"))
Or use the in-built Spark concat
function directly without expr
:
df.withColumn("test", concat($"lon", lit(","), $"lat"))
Since concat
takes columns as arguments lit
must be used here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With