Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Example of the Scala aggregate function

I have been looking and I cannot find an example or discussion of the aggregate function in Scala that I can understand. It seems pretty powerful.

Can this function be used to reduce the values of tuples to make a multimap-type collection? For example:

val list = Seq(("one", "i"), ("two", "2"), ("two", "ii"), ("one", "1"), ("four", "iv")) 

After applying aggregate:

Seq(("one" -> Seq("i","1")), ("two" -> Seq("2", "ii")), ("four" -> Seq("iv")) 

Also, can you give example of parameters z, segop, and combop? I'm unclear on what these parameters do.

like image 974
christangrant Avatar asked Aug 03 '11 14:08

christangrant


People also ask

What is an example of an aggregate function?

For example, avg() (average) can be computed by sum()/count(), where both sum() and count() are distributive aggregate functions. Similarly, it can be shown that min_N() and max_N() (which find the N minimum and N maximum values, respectively, in a given set) and standard_deviation() are algebraic aggregate functions.

What is aggregate function in Scala?

Scala | aggregate() Function. The aggregate() function is utilized to combine outcomes. Initially, a sequence operation is applied as that is the first parameter of aggregate() function and then its followed by a combine operation which is utilized to combine the solutions generated by the sequence operation performed.

What are aggregate functions discuss any three with example?

The following are the most commonly used SQL aggregate functions: AVG – calculates the average of a set of values. COUNT – counts rows in a specified table or view. MIN – gets the minimum value in a set of values.

What are the 5 aggregate functions?

SQL aggregate functions return information about the data in a database. AVG, for instance, returns the average of the values in a database column. There are five aggregate functions, which are: MIN, MAX, COUNT, SUM, and AVG.


1 Answers

Let's see if some ascii art doesn't help. Consider the type signature of aggregate:

def aggregate [B] (z: B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B 

Also, note that A refers to the type of the collection. So, let's say we have 4 elements in this collection, then aggregate might work like this:

z   A   z   A   z   A   z   A  \ /     \ /seqop\ /     \ /       B       B       B       B     \   /  combop   \   /       B _           _ B          \ combop  /               B 

Let's see a practical example of that. Say I have a GenSeq("This", "is", "an", "example"), and I want to know how many characters there are in it. I can write the following:

Note the use of par in the below snippet of code. The second function passed to aggregate is what is called after the individual sequences are computed. Scala is only able to do this for sets that can be parallelized.

import scala.collection.GenSeq val seq = GenSeq("This", "is", "an", "example") val chars = seq.par.aggregate(0)(_ + _.length, _ + _) 

So, first it would compute this:

0 + "This".length     // 4 0 + "is".length       // 2 0 + "an".length       // 2 0 + "example".length  // 7 

What it does next cannot be predicted (there are more than one way of combining the results), but it might do this (like in the ascii art above):

4 + 2 // 6 2 + 7 // 9 

At which point it concludes with

6 + 9 // 15 

which gives the final result. Now, this is a bit similar in structure to foldLeft, but it has an additional function (B, B) => B, which fold doesn't have. This function, however, enables it to work in parallel!

Consider, for example, that each of the four computations initial computations are independent of each other, and can be done in parallel. The next two (resulting in 6 and 9) can be started once their computations on which they depend are finished, but these two can also run in parallel.

The 7 computations, parallelized as above, could take as little as the same time 3 serial computations.

Actually, with such a small collection the cost in synchronizing computation would be big enough to wipe out any gains. Furthermore, if you folded this, it would only take 4 computations total. Once your collections get larger, however, you start to see some real gains.

Consider, on the other hand, foldLeft. Because it doesn't have the additional function, it cannot parallelize any computation:

(((0 + "This".length) + "is".length) + "an".length) + "example".length 

Each of the inner parenthesis must be computed before the outer one can proceed.

like image 146
Daniel C. Sobral Avatar answered Sep 23 '22 14:09

Daniel C. Sobral