Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Group By Key to (Key,List) Pair

I am trying to group some data by key where the value would be a list:

Sample data:

A 1
A 2
B 1
B 2

Expected result:

(A,(1,2))
(B,(1,2))

I am able to do this with the following code:

data.groupByKey().mapValues(List(_))

The problem is that when I then try to do a Map operation like the following:

groupedData.map((k,v) => (k,v(0))) 

It tells me I have the wrong number of parameters.

If I try:

groupedData.map(s => (s(0),s(1)))

It tells me that "(Any,List(Iterable(Any)) does not take parameters"

No clue what I am doing wrong. Is my grouping wrong? What would be a better way to do this?

Scala answers only please. Thanks!!

like image 704
manjam Avatar asked Dec 17 '15 21:12

manjam


2 Answers

You're almost there. Just replace List(_) with _.toList

data.groupByKey.mapValues(_.toList)
like image 112
zero323 Avatar answered Oct 04 '22 03:10

zero323


When you write an anonymous inline function of the form

ARGS => OPERATION

the entire part before the arrow (=>) is taken as the argument list. So, in the case of

(k, v) => ...

the interpreter takes that to mean a function that takes two arguments. In your case, however, you have a single argument which happens to be a tuple (here, a Tuple2, or a Pair - more fully, you appear to have a list of Pair[Any,List[Any]]). There are a couple of ways to get around this. First, you can use the sugared form of representing a pair, wrapped in an extra set of parentheses to show that this is the single expected argument for the function:

((x, y)) => ...

or, you can write the anonymous function in the form of a partial function that matches on tuples:

groupedData.map( case (k,v) => (k,v(0)) ) 

Finally, you can simply go with a single specified argument, as per your last attempt, but - realising it is a tuple - reference the specific field(s) within the tuple that you need:

groupedData.map(s => (s._2(0),s._2(1)))  // The key is s._1, and the value list is s._2   
like image 37
Shadowlands Avatar answered Oct 04 '22 05:10

Shadowlands