Calculate product of columns referenced from a list pyspark

Question

I have a loop generating an output of several tables of factors as well as storing the column name in a list:

| id | f_1a | f_2a |
|:---|:----:|:-----|
|1   |1.2   |0.95  |
|2   |0.7   |0.87  |
|3   |1.2   |1.4   |

col_lst = ['f1_a','f2_a']

| id | f_1b | f_2b | f_3b |
|:---|:----:|:-----|:-----|
|1   |1.6   |1.2   | 0.98 |
|2   |0.9   |0.65  | 1.7  |
|3   |1.1   |1.33  | 1.4  |

col_lst = ['f1_b','f2_b','f_3b']

I'm having difficulty figuring out a code with Pyspark that would allow me to create a new column that contains the product of the listed columns per table such that:

| id | f_1a | f_2a | f_a |
|:---|:----:|:-----|:----|
|1   |1.2   |0.95  |1.14 |
|2   |0.7   |0.87  |0.61 |
|3   |1.2   |1.4   |1.68 |

| id | f_1b | f_2b | f_3b | f_b  |
|:---|:----:|:-----|:-----|:-----|
|1   |1.6   |1.2   | 0.98 | 1.88 |
|2   |0.9   |0.65  | 1.7  | 1    |
|3   |1.1   |1.33  | 1.4  | 2.05 |

Any help would be greatly appreciated

wwnde · Accepted Answer

Use reduce to apply a unanimous function that multiplies column values row wise.

 df=spark.createDataFrame([(1   ,1.6   ,1.2   , 0.98)  , 
(2   ,0.9   ,0.65  , 1.7 )  , 
(3   ,1.1   ,1.33  , 1.4) ] , 

('id' , 'f_1b' , 'f_2b' , 'f_3b' ))
df.show()

solution

 df.withColumn('f_b', reduce(lambda a,b: round(a*b,2),[F.col(c) for c in  df.drop('id').columns])).show()

outcome

+---+----+----+----+----+
| id|f_1b|f_2b|f_3b| f_b|
+---+----+----+----+----+
|  1| 1.6| 1.2|0.98|1.88|
|  2| 0.9|0.65| 1.7| 1.0|
|  3| 1.1|1.33| 1.4|2.04|
+---+----+----+----+----+

anky · Answer

Here is another way using an expression:

First create your col_list

col_lst = ['f_1b','f_2b','f_3b']

Or

col_lst = [col for col in df.columns if col!='id']

Then:

from pyspark.sql import functions as F
df.withColumn("fb",F.round(F.expr("*".join(col_lst)),2)).show()

+---+----+----+----+----+
| id|f_1b|f_2b|f_3b|  fb|
+---+----+----+----+----+
|  1| 1.6| 1.2|0.98|1.88|
|  2| 0.9|0.65| 1.7|0.99|
|  3| 1.1|1.33| 1.4|2.05|
+---+----+----+----+----+

Calculate product of columns referenced from a list pyspark

Tags:

pyspark

databricks

OrbisUnum

2 Answers

wwnde

anky

Recent Activity

Donate For Us

Calculate product of columns referenced from a list pyspark

Tags:

pyspark

databricks

OrbisUnum

2 Answers

wwnde

anky

Related questions

Recent Activity

Donate For Us