I am very new to Apache spark so this question might not be well to ask, but I am not getting the difference between <code>combinebykey</code> and <code>aggregatebykey</code> and when to use which operation.

<code>aggregateByKey</code> takes an initial accumulator, a first lambda function to merge a value to an accumulator and a second lambda function to merge two accumulators. <code>combineByKey</code> is more general and adds an initial lambda function to create the initial accumulator Here an example: <pre class="prettyprint"><code>val pairs = sc.parallelize(List(("prova", 1), ("ciao", 2), ("prova", 2), ("ciao", 4), ("prova", 3), ("ciao", 6))) pairs.aggregateByKey(List[Any]())( (aggr, value) => aggr ::: (value :: Nil), (aggr1, aggr2) => aggr1 ::: aggr2 ).collect().toMap pairs.combineByKey( (value) => List(value), (aggr: List[Any], value) => aggr ::: (value :: Nil), (aggr1: List[Any], aggr2: List[Any]) => aggr1 ::: aggr2 ).collect().toMap </code></pre>

Difference between combinebykey and aggregatebykey

1 Answers

aggregateByKey takes an initial accumulator, a first lambda function to merge a value to an accumulator and a second lambda function to merge two accumulators.

combineByKey is more general and adds an initial lambda function to create the initial accumulator

Here an example:

val pairs = sc.parallelize(List(("prova", 1), ("ciao", 2),
                                ("prova", 2), ("ciao", 4),
                                ("prova", 3), ("ciao", 6)))

pairs.aggregateByKey(List[Any]())(
  (aggr, value) => aggr ::: (value :: Nil),
  (aggr1, aggr2) => aggr1 ::: aggr2
).collect().toMap

pairs.combineByKey(
  (value) => List(value),
  (aggr: List[Any], value) => aggr ::: (value :: Nil),
  (aggr1: List[Any], aggr2: List[Any]) => aggr1 ::: aggr2
).collect().toMap

176

answered Sep 20 '22 17:09

freedev

Related questions
                            
                                How to assert if a Completable has been subscribed/completed (RxJava2)
                            
                                Java 8 Lambda expression with Serialization
                            
                                JavaFX: SplitPane changes divider position on resize if there are items inside
                            
                                Java GUI Xmonad not working
                            
                                Lags when RecyclerView scrolling
                            
                                sum of two arrays element wise?
                            
                                ERROR [org.springframework.web.servlet.DispatcherServlet]
                            
                                Read Huge Excel file(500K rows) in java
                            
                                com.google.api.config.ServiceConfigSupplier - Failed to fetch default config version for service (only on localhost)
                            
                                how to limit post size in tomcat 8? How to make a custom reply for it?
                            
                                How to change layout for all items in RecyclerView?
                            
                                Unable to run project in Android Studio (libGDX)
                            
                                JVM garbage collection - tracing live objects in young generation
                            
                                How do I add a Maven pom.xml to an existing project without using an IDE?
                            
                                Java8 CompletableFuture conditional chaining
                            
                                SparkContext setLocalProperties
                            
                                Why is JDK1.8.0u121 unable to find the kerberos default_tkt_enctypes types? (KrbException: no supported default etypes for default_tkt_enctypes)
                            
                                Combine JavaFX with Python
                            
                                In Java 9, why are package collisions treated a bit differently in some cases?
                            
                                Most efficient way to stream on list of Futures

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between combinebykey and aggregatebykey

Tags:

java

apache-spark

Tejinder Singh Bedi

People also ask

1 Answers

freedev

Recent Activity

Donate For Us