I know the method rdd.firstwfirst() which gives me the first element in an RDD. Also there is the method rdd.take(num) Which gives me the first "num" elements. But isn't there a possibility to get an element by index? Thanks.e

This should be possible by first indexing the RDD. The transformation <code>zipWithIndex</code> provides a stable indexing, numbering each element in its original order. Given: <code>rdd = (a,b,c)</code> <pre class="prettyprint"><code>val withIndex = rdd.zipWithIndex // ((a,0),(b,1),(c,2)) </code></pre> To lookup an element by index, this form is not useful. First we need to use the index as key: <pre class="prettyprint"><code>val indexKey = withIndex.map{case (k,v) => (v,k)} //((0,a),(1,b),(2,c)) </code></pre> Now, it's possible to use the <code>lookup</code> action in PairRDD to find an element by key: <pre class="prettyprint"><code>val b = indexKey.lookup(1) // Array(b) </code></pre> If you're expecting to use <code>lookup</code> often on the same RDD, I'd recommend to cache the <code>indexKey</code> RDD to improve performance. How to do this using the Java API is an exercise left for the reader.

How to get element by Index in Spark RDD (Java)

1 Answers

This should be possible by first indexing the RDD. The transformation zipWithIndex provides a stable indexing, numbering each element in its original order.

Given: rdd = (a,b,c)

val withIndex = rdd.zipWithIndex // ((a,0),(b,1),(c,2))

To lookup an element by index, this form is not useful. First we need to use the index as key:

val indexKey = withIndex.map{case (k,v) => (v,k)}  //((0,a),(1,b),(2,c))

Now, it's possible to use the lookup action in PairRDD to find an element by key:

val b = indexKey.lookup(1) // Array(b)

If you're expecting to use lookup often on the same RDD, I'd recommend to cache the indexKey RDD to improve performance.

How to do this using the Java API is an exercise left for the reader.

102

answered Sep 29 '22 06:09

maasg

Related questions
                            
                                Is it possible for class to inherit the annotations of the super class
                            
                                Eclipse 'loading descriptor' takes ages
                            
                                Intercept object on method invocation with Mockito
                            
                                Using Enum for factory in Java, a best practice?
                            
                                Overload Controller Method in Java Spring
                            
                                JFrame without frame border, maximum button, minimum button and frame icon
                            
                                How to serve static content from tomcat
                            
                                How to convert an InputStream to a DataHandler?
                            
                                Retrieving Session ID with Spring Security
                            
                                Avoid insert 'null' values to database table via JPA
                            
                                Button for closing a JDialog
                            
                                How to detect when button pressed and released on android
                            
                                jackson deserialization json to java-objects
                            
                                SWT - OS agnostic way to get monospaced font
                            
                                Should Closeable be used as the Java equivalent for .NET's IDisposable?
                            
                                What's the purpose behind wildcards and how are they different from generics?
                            
                                Log4j output not displayed in Eclipse console
                            
                                How to run a java class with a jar in the classpath?
                            
                                How do I unsign a jar?
                            
                                Where do I have to place the JDBC driver for Tomcat's connection pool?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get element by Index in Spark RDD (Java)

Tags:

java

apache-spark

rdd

progNewbie

People also ask

1 Answers

maasg

Recent Activity

Donate For Us