Spark: count percentage percentages of a column values

Tags:

I am trying to improve my Spark Scala skills and I have this case which I cannot find a way to manipulate so please advise!

I have original data as it shown in the figure bellow:

enter image description here

I want to calculate the percentage of every result of the count column . E.g. the last error value is 64 how much is 64 as a percentage out of the all column values. Please note that I am reading the original data as Dataframes using sqlContext: Here is my code:

    val df1 = df.groupBy(" Code")
.agg(sum("count").alias("sum"), mean("count")
.multiply(100)
.cast("integer").alias("percentag‌e"))

I want results similar to this:

enter image description here

Thanks in advance!

653

asked Oct 21 '17 12:10

Foaad Mohamad Haddod

1 Answers

Use agg and window functions:

import org.apache.spark.sql.expressions._
import org.apache.spark.sql.functions._

df
  .groupBy("code")
  .agg(sum("count").alias("count"))
  .withColumn("fraction", col("count") /  sum("count").over())

answered Nov 16 '22 02:11

user8811088

Related questions
                            
                                Get the changed HTML content after it's updated by Javascript? (htmlunit)
                            
                                Is it possible to make Scala's JSON.parseFull() not to treat Integers as Decimals?
                            
                                Implementing Iterable
                            
                                What are the differences between mapcat in Clojure and flatMap in Scala in terms of what they operate on?
                            
                                Get Response body from play.api.mvc.Action[AnyContent] in Play framework (Scala)
                            
                                Meaning of type Set = Int => Boolean in Scala
                            
                                Specifying the size of a HashMap in Scala
                            
                                How can I load Avros in Spark using the schema on-board the Avro file(s)?
                            
                                Scala function that returns an anonymous object?
                            
                                sbt - exclude certain dependency only during publish
                            
                                Is there a naming convention for implicit classes?
                            
                                How to share sbt plugin configuration between multiple projects?
                            
                                How to simplify nested map calls?
                            
                                Why does enablePlugins(DockerPlugin) from sbt-docker in Play project give "error: reference to DockerPlugin is ambiguous"?
                            
                                maven publish jar to .ivy
                            
                                Spark: driver/worker configuration. Does driver run on Master node?
                            
                                Spark DataFrame equivalent to Pandas Dataframe `.iloc()` method?
                            
                                Scala: question marks in type parameters
                            
                                scala - trying to print overridden toString method
                            
                                ensimeConfig creates directories java and scala-2.11, which I don't need

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark: count percentage percentages of a column values

Tags:

dataframe

scala

apache-spark

percentage

Foaad Mohamad Haddod

People also ask

1 Answers

user8811088

Recent Activity

Donate For Us