In Spark 1.3, is there a way to access the key from mapValues
?
Specifically, if I have
val y = x.groupBy(someKey)
val z = y.mapValues(someFun)
can someFun
know which key of y it is currently operating on?
Or do I have to do
val y = x.map(r => (someKey(r), r)).groupBy(_._1)
val z = y.mapValues{ case (k, r) => someFun(r, k) }
Note: the reason I want to use mapValues
rather than map
is to preserve the partitioning.
mapValues maps the values while keeping the keys. notice that for key-value pair (3, 6), it produces (3,Range ()) since 6 to 5 produces an empty collection of values. flatMap "breaks down" collections into the elements of the collection.
Thus, in most cases, you'll want to get the key-value pair together. The entrySet () method returns a set of Map.Entry<K, V> objects that reside in the map. You can easily iterate over this set to get the keys and their associated values from a map.
Let's start with the given rdd. mapValues maps the values while keeping the keys. notice that for key-value pair (3, 6), it produces (3,Range ()) since 6 to 5 produces an empty collection of values. flatMap "breaks down" collections into the elements of the collection.
Duplicate keys are not allowed and each key can have at most one value in a map. Iterating over keys or values (or both) of a Map object is a pretty common use case and one that developers have to do every so often. Fortunately, the Map interface provides three collection views, which allow a map’s contents to be viewed:
In this case you can use mapPartitions
with the preservesPartitioning
attribute.
x.mapPartitions((it => it.map { case (k,rr) => (k, someFun(rr, k)) }), preservesPartitioning = true)
You just have to make sure you are not changing the partitioning, i.e. don't change the key.
You can't use the key with mapValues
. But you can preserve the partitioning with the mapPartitions
.
val pairs: Rdd[(Int, Int)] = ???
pairs.mapPartitions({ it =>
it.map { case (k, v) =>
// your code
}
}, preservesPartitioning = true)
Be careful to actually preserve the partitioning, the compiler will not be able to check it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With