Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to divide two aggreate sum dataframe

I want to divide the sum of two columns in pyspark. For example, I have a datasets like below:

    A  B  C
 1  1  2  3
 2  1  2  3
 3  1  2  3

What I want is to get sum of colA divide by sum of colB as below:

  6 (Sum of colB)  / 3 (Sum of colA) = 2

I have tried this:

sumofA = df.groupby().sum('A') 
sumofB = df.groupby().sum('B')

Result = B / A

but it produces this error:

TypeError: unsupported operand type(s) for /: 'DataFrame' and 'DataFrame'
like image 773
foy Avatar asked Nov 28 '25 00:11

foy


1 Answers

Your approach was correct, but you could just do the calculation inside the aggregation function only.

from pyspark.sql import functions as F
df.groupBy().agg(F.sum("B")/F.sum("A")).show()
+-----------------+
|(sum(B) / sum(A))|
+-----------------+
|              2.0|
+-----------------+

OR, you can collect it as a value using collect()[0][0]

from pyspark.sql import functions as F
a=df.groupBy().agg(F.sum("B")/F.sum("A")).collect()[0][0]
a

Out[5]: 2.0
like image 129
murtihash Avatar answered Nov 29 '25 13:11

murtihash