Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark getting keys from key-value RDD

Tags:

apache-spark

If I have an RDD that has key-value pair and I want to get only the key part, what is the most efficient way of doing it?

like image 821
Ammar Avatar asked Jul 03 '15 17:07

Ammar


People also ask

What is the key in RDD?

Spark Paired RDDs are defined as the RDD containing a key-value pair. There is two linked data item in a key-value pair (KVP). We can say the key is the identifier, while the value is the data corresponding to the key value. In addition, most of the Spark operations work on RDDs containing any type of objects.

How do you sort RDD by value?

Method 1: Using sortBy() sortBy() is used to sort the data by value efficiently in pyspark. It is a method available in rdd. It uses a lambda expression to sort the data based on columns.

What is the difference between RDD and pair RDD?

Spark Paired RDDs are nothing but RDDs containing a key-value pair. Unpaired RDDs consists of any type of objects. However, paired RDDs (key-value) attains few special operations in it. Such as, distributed “shuffle” operations, grouping or aggregating the elements the key.

Which of the Spark function merges the RDD values for each key?

Spark Pair RDD Transformation Functions Combines the elements for each key. It's flatten the values of each key with out changing key values and keeps the original RDD partition. Merges the values of each key.


1 Answers

It is very simple yourRDD.keys()

Similarly you can get RDD with values by youRDD.values()

For this and other RDD transformations and actions see examples here

like image 151
lanenok Avatar answered Sep 22 '22 00:09

lanenok