remove last character from string

Question

I am trying to create a new dataframe column (b) removing the last character from (a). column a is a string with different lengths so i am trying the following code -

from pyspark.sql.functions import *
df.select(substring('a', 1, length('a') -1 ) ).show()

I get a TypeError: 'Column' object is not callable

it seems to be due to using multiple functions but i cant understand why as these work on their own -

if i hardcode the column length this will work

df.select(substring('a', 1, 10 ) ).show()

or if i use length on it's own it works

df.select(length('a') ).show()

why can i not use multiple functions ? is there an easier method of removing the last character from all rows in a column ?

ollik1 · Accepted Answer

Using substr

df.select(col('a').substr(lit(0), length(col('a')) - 1))

or using regexp_extract:

df.select(regexp_extract(col('a'), '(.*).$', 1))

Function substring does not work as the parameters pos and len needs to be integers, not columns http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=substring#pyspark.sql.functions.substring

remove last character from string

Tags:

apache-spark

apache-spark-sql

pyspark

David

1 Answers

ollik1

Recent Activity

Donate For Us

remove last character from string

Tags:

apache-spark

apache-spark-sql

pyspark

David

1 Answers

ollik1

Related questions

Recent Activity

Donate For Us