Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Issue with RDD - list index out of range

I am using values obtained from one RDD in another. I use the first RDD to calculate averages and do a .collect() to fetch it into a variable called z.

When accessing z however, I get an error, list index out of range.

What am I doing wrong?

avgtuples = summedtuples.map(lambda (ct, (Sx, Sy)): (((Sx*1.0)/ct), ((Sy*1.0)/ct)))
z = avgtuples.collect()
newmap = reducedhostbyte.map(lambda (h, (x, y)): (n, get_vals(x, y, z[0], z[1])))

The value of z is [(24.910157132138149, 474512.76637794758)].

like image 998
mhn Avatar asked Oct 19 '22 23:10

mhn


1 Answers

If z is [(24.910157132138149, 474512.76637794758)], it is a list with a single element. So z[1] causes an IndexError.

That single element (z[0]) is a two-element tuple, so presumably you want to access those two elements as z[0] and z[1]. If so, this is what you need:

z = avgtuples.collect()[0]

(Note the [0] at the end. It takes the first (and only) element of the list.)

It's strange that you would have a single-row RDD (summedtuples) in the first place. There is probably more that could be improved in your code, but that's outside of the scope of the question.

like image 63
Daniel Darabos Avatar answered Oct 21 '22 15:10

Daniel Darabos