I have a column in a data frame in pyspark like “Col1” below. I would like to create a new column “Col2” with the length of each string from “Col1”. I’m new to pyspark, I’ve been googling but haven’t seen any examples of how to do this. Any tips are very much appreciated.
example:
Col1 Col2
12 2
123 3
In order to get string length of column in pyspark we will be using length() Function. We look at an example on how to get string length of the specific column in pyspark. we will also look at an example on filter using the length of the column. Get string length of the column in pyspark using length() function.
PySpark Example to Filter DataFrame by the length of a Column. In PySpark you can use the length() function by importing from pyspark.
In PySpark, to add a new column to DataFrame use lit() function by importing from pyspark. sql. functions import lit , lit() function takes a constant value you wanted to add and returns a Column type, if you wanted to add a NULL / None use lit(None) .
You can use the length
function:
import pyspark.sql.functions as F
df.withColumn('Col2', F.length('Col1')).show()
+----+----+
|Col1|Col2|
+----+----+
| 12| 2|
| 123| 3|
+----+----+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With