Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

create column with length of strings in another column pyspark

I have a column in a data frame in pyspark like “Col1” below. I would like to create a new column “Col2” with the length of each string from “Col1”. I’m new to pyspark, I’ve been googling but haven’t seen any examples of how to do this. Any tips are very much appreciated.

example:

Col1 Col2
12   2
123  3
like image 564
user3476463 Avatar asked May 11 '18 23:05

user3476463


People also ask

How do you get string length in PySpark?

In order to get string length of column in pyspark we will be using length() Function. We look at an example on how to get string length of the specific column in pyspark. we will also look at an example on filter using the length of the column. Get string length of the column in pyspark using length() function.

How do I get the length of a column in PySpark DataFrame?

PySpark Example to Filter DataFrame by the length of a Column. In PySpark you can use the length() function by importing from pyspark.

How do I create a new column from an existing column in PySpark?

In PySpark, to add a new column to DataFrame use lit() function by importing from pyspark. sql. functions import lit , lit() function takes a constant value you wanted to add and returns a Column type, if you wanted to add a NULL / None use lit(None) .


1 Answers

You can use the length function:

import pyspark.sql.functions as F
df.withColumn('Col2', F.length('Col1')).show()
+----+----+
|Col1|Col2|
+----+----+
|  12|   2|
| 123|   3|
+----+----+
like image 132
Psidom Avatar answered Sep 19 '22 08:09

Psidom