how to modify one column value in one row used by pyspark

Tags:

1 Answers

If you want to modify a subset of your DataFrame and keep the rest unchanged, the best option would be to use pyspark.sql.functions.when() as using filter or pyspark.sql.functions.where() would remove all rows where the condition is not met.

from pyspark.sql.functions import col, when

valueWhenTrue = None  # for example

df.withColumn(
    "existingColumnToUpdate",
    when(
        col("userid") == 22650984,
        valueWhenTrue
    ).otherwise(col("existingColumnToUpdate"))
)

When will evaluate the first argument as a boolean condition. If the condition is True, it will return the second argument. You can chain together multiple when statements as shown in this post and also this post. Or use otherwise() to specify what to do when the condition is False.

In this example, I am updating an existing column "existingColumnToUpdate". When the userid is equal to the specified value, I will update the column with valueWhenTrue. Otherwise, we will keep the value in the column unchanged.

answered Sep 28 '22 02:09

pault

Related questions
                            
                                How to create InputDStream with offsets in PySpark (using KafkaUtils.createDirectStream)?
                            
                                SparkSQL read from MySQL database table using Python [duplicate]
                            
                                Pyspark Dataframe group by filtering
                            
                                Spark Dataframe - Python - count substring in string
                            
                                TypeError: got an unexpected keyword argument
                            
                                How to handle an AnalysisException on Spark SQL?
                            
                                What are the differences between sc.parallelize and sc.textFile?
                            
                                basedir must be absolute: ?/.ivy2/local
                            
                                Saving result of DataFrame show() to string in pyspark
                            
                                PySpark DataFrame unable to drop duplicates
                            
                                Using spark-submit with python main
                            
                                Apply a function to groupBy data with pyspark
                            
                                PySpark DataFrame filter using logical AND over list of conditions -- Numpy All Equivalent
                            
                                How to solve yarn container sizing issue on spark?
                            
                                Dataframe transpose with pyspark in Apache Spark
                            
                                Apply MinMaxScaler on multiple columns in PySpark
                            
                                PySpark broadcast variables from local functions
                            
                                Pandas Dataframe to RDD
                            
                                Merge multiple columns into one column in pyspark dataframe using python
                            
                                How to turn off scientific notation in pyspark?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to modify one column value in one row used by pyspark

Tags:

pyspark

Frank

People also ask

1 Answers

pault

Recent Activity

Donate For Us