Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sum the values of a column in pyspark dataframe

I am working in Pyspark and I have a data frame with the following columns.

Q1 = spark.read.csv("Q1final.csv",header = True, inferSchema = True)
Q1.printSchema()

root
|-- index_date: integer (nullable = true)
|-- item_id: integer (nullable = true)
|-- item_COICOP_CLASSIFICATION: integer (nullable = true)
|-- item_desc: string (nullable = true)
|-- index_algorithm: integer (nullable = true)
|-- stratum_ind: integer (nullable = true)
|-- item_index: double (nullable = true)
|-- all_gm_index: double (nullable = true)
|-- gm_ra_index: double (nullable = true)
|-- coicop_weight: double (nullable = true)
|-- item_weight: double (nullable = true)
|-- cpih_coicop_weight: double (nullable = true)

I need the sum of all the elements in the last column (cpih_coicop_weight) to use as a Double in other parts of my program. How can I do it? Thank you very much in advance!

like image 453
Lauren Avatar asked Feb 01 '18 17:02

Lauren


People also ask

How do you sum values of a column in PySpark?

By using the sum() method, we can get the total value from the column, and finally, we can use the collect() method to get the sum from the column. Where, df is the input PySpark DataFrame. column_name is the column to get the sum value.

How do you sum two columns in PySpark?

In order to calculate sum of two or more columns in pyspark. we will be using + operator of the column to calculate sum of columns. Second method is to calculate sum of columns in pyspark and add it to the dataframe by using simple + operation along with select Function.


1 Answers

If you want just a double or int as return, the following function will work:

def sum_col(df, col):
    return df.select(F.sum(col)).collect()[0][0]

Then

sum_col(Q1, 'cpih_coicop_weight')

will return the sum. I am new to pyspark so I am not sure why such a simple method of a column object is not in the library.

like image 51
Louis Yang Avatar answered Sep 21 '22 17:09

Louis Yang