I have a PySpark DataFrame, df1, that looks like:
CustomerID CustomerValue
12 .17
14 .15
14 .25
17 .50
17 .01
17 .35
I have a second PySpark DataFrame, df2, that is df1 grouped by CustomerID and aggregated by the sum function. It looks like this:
CustomerID CustomerValueSum
12 .17
14 .40
17 .86
I want to add a third column to df1 that is df1['CustomerValue'] divided by df2['CustomerValueSum'] for the same CustomerIDs. This would look like:
CustomerID CustomerValue NormalizedCustomerValue
12 .17 1.00
14 .15 .38
14 .25 .62
17 .50 .58
17 .01 .01
17 .35 .41
In other words, I'm trying to convert this Python/Pandas code to PySpark:
normalized_list = []
for idx, row in df1.iterrows():
(
normalized_list
.append(
row.CustomerValue / df2[df2.CustomerID == row.CustomerID].CustomerValueSum
)
)
df1['NormalizedCustomerValue'] = [val.values[0] for val in normalized_list]
How can I do this?
The PySpark SQL provides the split() function to convert delimiter separated String to an Array (StringType to ArrayType) column on DataFrame It can be done by splitting the string column on the delimiter like space, comma, pipe, etc.
functions provide a function split() which is used to split DataFrame string Column into multiple columns. Parameters: str: str is a Column or str to split.
mul() is used to multiply all the values in the entire dataframe with a value, and div() is used to divide all the values by a value in the pyspark pandas dataframe and return the quotient. mod() is used to divide all the values by a value in the pyspark pandas dataframe and return the remainder.
Example 1: Split dataframe using 'DataFrame.limit()' We will make use of the split() method to create 'n' equal dataframes. Where, Limits the result count to the number specified.
Code:
import pyspark.sql.functions as F
df1 = df1\
.join(df2, "CustomerID")\
.withColumn("NormalizedCustomerValue", (F.col("CustomerValue") / F.col("CustomerValueSum")))\
.drop("CustomerValueSum")
Output:
df1.show()
+----------+-------------+-----------------------+
|CustomerID|CustomerValue|NormalizedCustomerValue|
+----------+-------------+-----------------------+
| 17| 0.5| 0.5813953488372093|
| 17| 0.01| 0.011627906976744186|
| 17| 0.35| 0.4069767441860465|
| 12| 0.17| 1.0|
| 14| 0.15| 0.37499999999999994|
| 14| 0.25| 0.625|
+----------+-------------+-----------------------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With