I want to update value when userid=22650984.How to do it in pyspark platform?thank you for helping.
>>>xxDF.select('userid','registration_time').filter('userid="22650984"').show(truncate=False)
18/04/08 10:57:00 WARN TaskSetManager: Lost task 0.1 in stage 57.0 (TID 874, shopee-hadoop-slave89, executor 9): TaskKilled (killed intentionally)
18/04/08 10:57:00 WARN TaskSetManager: Lost task 11.1 in stage 57.0 (TID 875, shopee-hadoop-slave97, executor 16): TaskKilled (killed intentionally)
+--------+----------------------------+
|userid |registration_time |
+--------+----------------------------+
|22650984|270972-04-26 13:14:46.345152|
+--------+----------------------------+
You can replace column values of PySpark DataFrame by using SQL string functions regexp_replace(), translate(), and overlay() with Python examples.
Spark withColumn() function of the DataFrame is used to update the value of a column. withColumn() function takes 2 arguments; first the column you wanted to update and the second the value you wanted to update with. If the column name specified not found, it creates a new column with the value specified.
The PySpark SQL provides the split() function to convert delimiter separated String to an Array (StringType to ArrayType) column on DataFrame It can be done by splitting the string column on the delimiter like space, comma, pipe, etc. and converting it into ArrayType.
In pyspark the drop() function can be used to remove values/columns from the dataframe.
If you want to modify a subset of your DataFrame and keep the rest unchanged, the best option would be to use pyspark.sql.functions.when()
as using filter
or pyspark.sql.functions.where()
would remove all rows where the condition is not met.
from pyspark.sql.functions import col, when
valueWhenTrue = None # for example
df.withColumn(
"existingColumnToUpdate",
when(
col("userid") == 22650984,
valueWhenTrue
).otherwise(col("existingColumnToUpdate"))
)
When will evaluate the first argument as a boolean condition. If the condition is True
, it will return the second argument. You can chain together multiple when
statements as shown in this post and also this post. Or use otherwise()
to specify what to do when the condition is False
.
In this example, I am updating an existing column "existingColumnToUpdate"
. When the userid
is equal to the specified value, I will update the column with valueWhenTrue
. Otherwise, we will keep the value in the column unchanged.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With