I have a sample dataframe <pre class="prettyprint"><code>df_that_I_have +---------+---------+-------+ | country | members | some | +---------+---------+-------+ | India | 50 | 1 | +---------+---------+-------+ | Japan | 20 | 3 | +---------+---------+-------+ | India | 20 | 1 | +---------+---------+-------+ | Japan | 10 | 3 | +---------+---------+-------+ </code></pre> and I want a dataframe that looks like this <pre class="prettyprint"><code>df_that_I_want +---------+---------+-------+ | country | members | some | +---------+---------+-------+ | India | 70 | 10 | // 5 * Sum of "some" for India, i.e. (1 + 1) +---------+---------+-------+ | Japan | 30 | 30 | // 5 * Sum of "some" for Japan, i.e. (3 + 3) +---------+---------+-------+ </code></pre> The second dataframe has the sum of <code>members</code> and the sum of <code>some</code> multiplied 5. This is what I'm doing to achieve this <pre class="prettyprint"><code>val df_that_I_want = df_that_I_have .select(df_that_I_have("country"), df_that_I_have.groupBy("country").sum("members"), 5 * df_that_I_have.groupBy("country").sum("some")) //Problem here </code></pre> But the compiler does not allow me to do this because apparently I can't multiply 5 with a column. How can I multiply an Integer value with the sum of <code>some</code> for each country?

Please try this <pre class="prettyprint"><code>df_that_I_have.select("country").groupBy("country").agg(sum("members"), sum("some") * lit(5)) </code></pre>

Scala/ Spark- Multiply an Integer with each value in a Dataframe Column

Tags:

scala

apache-spark

I have a sample dataframe

df_that_I_have
+---------+---------+-------+
| country | members | some  |
+---------+---------+-------+
| India   | 50      | 1     |
+---------+---------+-------+
| Japan   | 20      | 3     |
+---------+---------+-------+
| India   | 20      | 1     |
+---------+---------+-------+
| Japan   | 10      | 3     |
+---------+---------+-------+

and I want a dataframe that looks like this

df_that_I_want
+---------+---------+-------+
| country | members | some  |
+---------+---------+-------+
| India   | 70      | 10    | // 5 * Sum of "some" for India, i.e. (1 + 1)
+---------+---------+-------+
| Japan   | 30      | 30    | // 5 * Sum of "some" for Japan, i.e. (3 + 3)
+---------+---------+-------+

The second dataframe has the sum of members and the sum of some multiplied 5.

This is what I'm doing to achieve this

val df_that_I_want = df_that_I_have
                        .select(df_that_I_have("country"),
                                df_that_I_have.groupBy("country").sum("members"),
                                5 * df_that_I_have.groupBy("country").sum("some")) //Problem here

But the compiler does not allow me to do this because apparently I can't multiply 5 with a column.

How can I multiply an Integer value with the sum of some for each country?

528

asked Apr 18 '17 07:04

Amber

2 Answers

You can try lit function.

scala> val df_that_I_have = Seq(("India",50,1),("India",20,1),("Japan",20,3),("Japan",10,3)).toDF("Country","Members","Some")
df_that_I_have: org.apache.spark.sql.DataFrame = [Country: string, Members: int, Some: int]

scala> val df1 = df_that_I_have.groupBy("country").agg(sum("members"), sum("some") * lit(5))
df1: org.apache.spark.sql.DataFrame = [country: string, sum(members): bigint, ((sum(some),mode=Complete,isDistinct=false) * 5): bigint]

scala> val df_that_I_want= df1.select($"Country",$"sum(Members)".alias("Members"), $"((sum(Some),mode=Complete,isDistinct=false) * 5)".alias("Some"))
df_that_I_want: org.apache.spark.sql.DataFrame = [Country: string, Members: bigint, Some: bigint]

scala> df_that_I_want.show

+-------+-------+----+
|Country|Members|Some|
+-------+-------+----+
|  India|     70|  10|
|  Japan|     30|  30|
+-------+-------+----+

145

answered Oct 03 '22 08:10

Rajat Mishra

Please try this

df_that_I_have.select("country").groupBy("country").agg(sum("members"), sum("some") * lit(5))

answered Oct 03 '22 10:10

pasha701

Related questions
                            
                                Using plotly with zeppellin in scala
                            
                                Spark: shuffle operation leading to long GC pause
                            
                                Why does transform do side effects (println) only once in Structured Streaming?
                            
                                How to implement a recursive Fibonacci sequence in Scala using FS2?
                            
                                How to use Spark-Scala to download a CSV file from the web?
                            
                                How does circe parse a generic type object to Json?
                            
                                play/scala , implicit request => what is meaning? [duplicate]
                            
                                Apache Spark - Dataset operations fail in abstract base class?
                            
                                Sort by date an Array of a Spark DataFrame Column
                            
                                Scala: implicitly to implicit class
                            
                                Processing (OSM) PBF files in Spark
                            
                                Get name of defining val
                            
                                How can I locate where an implicit comes from in Scala?
                            
                                Benefit of Coproduct over `sealed trait`?
                            
                                Using stat.bloomFilter in Spark 2.0.0 to filter another dataframe
                            
                                How should I write general function take two variable and add them in scala?
                            
                                Scala: For loop that matches ints in a List
                            
                                Why is it possible to instantiate multiple traits in Scala, but not a single one?
                            
                                what's the difference between Array and Buffer when using scala?
                            
                                How to read environment variables in TypeSafe config in scala?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With