Pyspark RDD: find index of an element

Tags:

pyspark

I am new to pyspark and I am trying to convert a list in python to rdd and then I need to find elements index using the rdd. For the first part I am doing:

list = [[1,2],[1,4]]
rdd = sc.parallelize(list).cache()

So now the rdd is actually my list. The thing is that I want to find index of any arbitrary element something like "index" function which works for python lists. I am aware of a function called zipWithIndex which assign index to each element but I could not find proper example in python (there are examples with java and scala).

Thanks.

574

asked Apr 05 '16 22:04

ahajib

1 Answers

Use filter and zipWithIndex:

rdd.zipWithIndex().
filter(lambda (key,index) : key == [1,2]).
map(lambda (key,index) : index).collect()

Note that [1,2] here can be easily changed to a variable name and this whole expression can be wrapped within a function.

How It Works

zipWithIndex simply returns a tuple of (item,index) like so:

rdd.zipWithIndex().collect()
> [([1, 2], 0), ([1, 4], 1)]

filter finds only those that match a particular criterion (in this case, that key equals a specific sublist):

rdd.zipWithIndex().filter(lambda (key,index) : key == [1,2]).collect()
> [([1, 2], 0)]

map is fairly obvious, we can just get back the index:

rdd.zipWithIndex().filter(lambda (key,index) : key == [1,2]).
map(lambda (key,index): index).collect()
> [0]

and then we can simply get the first element by indexing [0] if you want.

121

answered Nov 14 '22 21:11

Akshat Mahajan

Related questions
                            
                                How to print an entire list while not starting by the first item
                            
                                python .get() and None
                            
                                How to iterate over a list in django templates? [duplicate]
                            
                                Nested For Loop in Jinja2
                            
                                Difference between Python's collections.Counter and nltk.probability.FreqDist
                            
                                LabelEncoder: How to keep a dictionary that shows original and converted variable
                            
                                Extracting Text Between HTML Comments with BeautifulSoup
                            
                                Tuples and Dictionaries contained within a List
                            
                                What is the correct indention for pep8 on long lines with argument list and assignment
                            
                                Compiling Python 3.5 code with Cython and MinGW on Windows 7 (64bit)
                            
                                Implementation of Gaussian Process Regression in Python y(n_samples, n_targets)
                            
                                Finding the index wise maximum values of two lists
                            
                                Write factorial with while loop python
                            
                                Failed to compile cuda_ndarray.cu: libcublas.so.7.5: cannot open shared object file
                            
                                Mock two separate responses to same function in same test
                            
                                Pandas cut method excludes lower bound
                            
                                Append Key and Value to a Key Value pair Dictionary Python
                            
                                Spark Convert Data Frame Column to dense Vector for StandardScaler() "Column must be of type org.apache.spark.ml.linalg.VectorUDT"
                            
                                How to use for loop and add one day (timedelta) every time
                            
                                Uninstall Python 2.7 from Mac OS X El Capitan

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With