How Can I Obtain an Element Position in Spark's RDD?

Tags:

I am new to Apache Spark, and I know that the core data structure is RDD. Now I am writing some apps which require element positional information. For example, after converting an ArrayList into a (Java)RDD, for each integer in RDD, I need to know its (global) array subscript. Is it possible to do it?

As I know, there is a take(int) function for RDD, so I believe the positional information is still maintained in RDD.

415

asked Sep 25 '14 19:09

wayi

2 Answers

I believe in most cases, zipWithIndex() will do the trick, and it will preserve the order. Read the comments again. My understanding is that it exactly means keep the order in the RDD.

scala> val r1 = sc.parallelize(List("a", "b", "c", "d", "e", "f", "g"), 3)
scala> val r2 = r1.zipWithIndex
scala> r2.foreach(println)
(c,2)
(d,3)
(e,4)
(f,5)
(g,6)
(a,0)
(b,1)

Above example confirm it. The red has 3 partitions, and a with index 0, b with index 1, etc.

136

answered Sep 23 '22 18:09

zhang zhan

Essentially, RDD's zipWithIndex() method seems to do this, but it won't preserve the original ordering of the data the RDD was created from. At least you'll get a stable ordering.

val orig: RDD[String] = ...
val indexed: RDD[(String, Long)] = orig.zipWithIndex()

The reason you're unlikely to find something that preserves the order in the original data is buried in the API doc for zipWithIndex():

"Zips this RDD with its element indices. The ordering is first based on the partition index and then the ordering of items within each partition. So the first item in the first partition gets index 0, and the last item in the last partition receives the largest index. This is similar to Scala's zipWithIndex but it uses Long instead of Int as the index type. This method needs to trigger a spark job when this RDD contains more than one partitions."

So it looks like the original order is discarded. If preserving the original order is important to you, it looks like you need to add the index before you create the RDD.

answered Sep 22 '22 18:09

Spiro Michaylov

Related questions
                            
                                Calculating displacement using Accelerometer and Gyroscope (MPU6050)
                            
                                CSS3 transform rotate text, fixed position left and right, vertically centered
                            
                                Magento 1.9.1 not sorting Configurable product attributes dropdown by position
                            
                                xpath - get first 10 items of selected set
                            
                                Unable to position pager (navigation bar) above jqGrid
                            
                                Form position on lower right corner of the screen
                            
                                gVim - show position marker of matched search terms
                            
                                Find caret position in textarea in pixels [duplicate]
                            
                                Android recyclerview issue with animations
                            
                                How can I quickly find the relatively positioned HTML parent element of an absolutely positioned child?
                            
                                Simulate position:fixed in jQuery
                            
                                CSS Borders: Distance from Object Edge?
                            
                                CSS float right moves element right and down (I don't want down).
                            
                                Generate paired stacked bar charts in ggplot (using position_dodge only on some variables)
                            
                                Why does fixed positioning alter the width of an element?
                            
                                How to get indexes of all occurrences of a pattern in a string
                            
                                jquery animate position in percentage
                            
                                jquery scroll to specific div position in case of error and focus
                            
                                get new x/y positions of element after scroll using jQuery
                            
                                How align 2 adjacent divs horizontally WITHOUT float?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How Can I Obtain an Element Position in Spark's RDD?

Tags:

position

apache-spark

rdd

wayi

People also ask

2 Answers

zhang zhan

Spiro Michaylov

Recent Activity

Donate For Us