I want to convert the values inside a column to lowercase. Currently if I use the lower()
method, it complains that column objects are not callable. Since there's a function called lower()
in SQL, I assume there's a native Spark solution that doesn't involve UDFs, or writing any SQL.
In order to convert a column to Upper case in pyspark we will be using upper() function, to convert a column to Lower case in pyspark is done using lower() function, and in order to convert to title case or proper case in pyspark uses initcap() function.
PySpark withColumn() is a transformation function of DataFrame which is used to change the value, convert the datatype of an existing column, create a new column, and many more.
Method 1: Using withColumnRenamed() We will use of withColumnRenamed() method to change the column names of pyspark data frame. existingstr: Existing column name of data frame to rename. newstr: New column name. Returns type: Returns a data frame by renaming an existing column.
Import lower
alongside col
:
from pyspark.sql.functions import lower, col
Combine them together using lower(col("bla"))
. In a complete query:
spark.table('bla').select(lower(col('bla')).alias('bla'))
which is equivalent to the SQL query
SELECT lower(bla) AS bla FROM bla
To keep the other columns, do
spark.table('foo').withColumn('bar', lower(col('bar')))
Needless to say, this approach is better than using a UDF because UDFs have to call out to Python (which is a slow operation, and Python itself is slow), and is more elegant than writing it in SQL.
You can use a combination of concat_ws and split
from pyspark.sql.functions import * df.withColumn('arr_str', lower(concat_ws('::','arr'))).withColumn('arr', split('arr_str','::')).drop('arr_str')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With