How to sum the values of a column in pyspark dataframe

Tags:

I am working in Pyspark and I have a data frame with the following columns.

Q1 = spark.read.csv("Q1final.csv",header = True, inferSchema = True)
Q1.printSchema()

root
|-- index_date: integer (nullable = true)
|-- item_id: integer (nullable = true)
|-- item_COICOP_CLASSIFICATION: integer (nullable = true)
|-- item_desc: string (nullable = true)
|-- index_algorithm: integer (nullable = true)
|-- stratum_ind: integer (nullable = true)
|-- item_index: double (nullable = true)
|-- all_gm_index: double (nullable = true)
|-- gm_ra_index: double (nullable = true)
|-- coicop_weight: double (nullable = true)
|-- item_weight: double (nullable = true)
|-- cpih_coicop_weight: double (nullable = true)

I need the sum of all the elements in the last column (cpih_coicop_weight) to use as a Double in other parts of my program. How can I do it? Thank you very much in advance!

453

asked Feb 01 '18 17:02

Lauren

1 Answers

If you want just a double or int as return, the following function will work:

def sum_col(df, col):
    return df.select(F.sum(col)).collect()[0][0]

Then

sum_col(Q1, 'cpih_coicop_weight')

will return the sum. I am new to pyspark so I am not sure why such a simple method of a column object is not in the library.

answered Sep 21 '22 17:09

Louis Yang

Related questions
                            
                                Where is the Spark UI on Google Dataproc?
                            
                                How to convert ArrayType to DenseVector in PySpark DataFrame?
                            
                                Executing separate streaming queries in spark structured streaming
                            
                                Unable to run a basic GraphFrames example
                            
                                unexpected type: <class 'pyspark.sql.types.DataTypeSingleton'> when casting to Int on a ApacheSpark Dataframe
                            
                                Link Spark with iPython Notebook
                            
                                How to fix "java.io.NotSerializableException: org.apache.kafka.clients.consumer.ConsumerRecord" in Spark Streaming Kafka Consumer?
                            
                                Efficient way to read specific columns from parquet file in spark
                            
                                How to overwrite entire existing column in Spark dataframe with new column?
                            
                                Read whole text files from a compression in Spark
                            
                                Full outer join in pyspark data frames
                            
                                when to use mapParitions and mapPartitionsWithIndex?
                            
                                How to add column with constant in Spark-java data frame
                            
                                How do I get the last item from a list using pyspark?
                            
                                Dynamically rename multiple columns in PySpark DataFrame
                            
                                Converting a dataframe into JSON (in pyspark) and then selecting desired fields
                            
                                SparkException: Values to assemble cannot be null
                            
                                Comparing Cassandra's CQL vs Spark/Shark queries vs Hive/Hadoop (DSE version)
                            
                                Apache Spark: get elements of Row by name
                            
                                How to re-partition pyspark dataframe?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to sum the values of a column in pyspark dataframe

Tags:

dataframe

sum

apache-spark

pyspark

Lauren

People also ask

1 Answers

Louis Yang

Recent Activity

Donate For Us