How can I sum multiple columns in a spark dataframe in pyspark?

Tags:

I've got a list of column names I want to sum

columns = ['col1','col2','col3']

How can I add the three and put it in a new column ? (in an automatic way, so that I can change the column list and have new results)

Dataframe with result I want:

col1   col2   col3   result
 1      2      3       6

556

asked Nov 14 '18 10:11

Manrique

1 Answers

Try this:

df = df.withColumn('result', sum(df[col] for col in df.columns))

df.columns will be list of columns from df.

answered Oct 22 '22 22:10

Mayank Porwal

Related questions
                            
                                How can I use an alert dialog with Python in linux?
                            
                                makedirs gives OSError: [Errno 13] Permission denied: '/pdf_files'
                            
                                How to test authentication using REST Framework JWT?
                            
                                Importing mpl_toolkits.basemap on Windows?
                            
                                Pre-Populate an edit form with WTForms and Flask
                            
                                Logical indexing with lists
                            
                                How do you update to the latest python 3.5.1 version on a raspberry pi?
                            
                                matplotlib plot bar and line charts together
                            
                                Passing arguments to Python from Shell Script
                            
                                Install npm packages in Python virtualenv
                            
                                Fitting an ellipse to a set of data points in python
                            
                                Can python get the screen shot of a specific window?
                            
                                Keras | TypeError: __init__() missing 1 required positional argument: 'nb_col'
                            
                                Pip install Killed [duplicate]
                            
                                Python loop optimization
                            
                                Python Pandas - Highlighting maximum value in column
                            
                                sklearn StandardScaler returns all zeros
                            
                                Python Simple Salesforce Select All Fields
                            
                                Why are multiple instances of Tk discouraged?
                            
                                Pandas: Rounding to nearest Hour

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I sum multiple columns in a spark dataframe in pyspark?

Tags:

python

apache-spark

apache-spark-sql

pyspark

Manrique

People also ask

1 Answers

Mayank Porwal

Recent Activity

Donate For Us