Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Dataframe column with last character of other column

I'm looking for a way to get the last character from a string in a dataframe column and place it into another column.

I have a Spark dataframe that looks like this:

    animal
    ======
    cat
    mouse
    snake

I want something like this:

    lastchar
    ========
    t
    e
    e

Right now I can do this with a UDF that looks like:

    def get_last_letter(animal):
        return animal[-1]

    get_last_letter_udf = udf(get_last_letter, StringType())

    df.select(get_last_letter_udf("animal").alias("lastchar")).show()

I'm mainly curious if there's a better way to do this without a UDF. Thanks!

like image 620
mikestaszel Avatar asked Aug 04 '17 17:08

mikestaszel


2 Answers

Just use the substring function

from pyspark.sql.functions import substring
df.withColumn("b", substring(col("columnName"), -1, 1))
like image 133
Assaf Mendelson Avatar answered Oct 12 '22 10:10

Assaf Mendelson


Another way to do this would be with the "expr" function:

from pyspark.sql.functions import expr

df.withColumn("lastchar", expr('RIGHT(animal, 1)')).show()
like image 27
gcollar Avatar answered Oct 12 '22 11:10

gcollar