I am using values obtained from one RDD in another. I use the first RDD to calculate averages and do a .collect()
to fetch it into a variable called z
.
When accessing z
however, I get an error, list index out of range
.
What am I doing wrong?
avgtuples = summedtuples.map(lambda (ct, (Sx, Sy)): (((Sx*1.0)/ct), ((Sy*1.0)/ct)))
z = avgtuples.collect()
newmap = reducedhostbyte.map(lambda (h, (x, y)): (n, get_vals(x, y, z[0], z[1])))
The value of z
is [(24.910157132138149, 474512.76637794758)]
.
If z
is [(24.910157132138149, 474512.76637794758)]
, it is a list with a single element. So z[1]
causes an IndexError
.
That single element (z[0]
) is a two-element tuple, so presumably you want to access those two elements as z[0]
and z[1]
. If so, this is what you need:
z = avgtuples.collect()[0]
(Note the [0]
at the end. It takes the first (and only) element of the list.)
It's strange that you would have a single-row RDD (summedtuples
) in the first place. There is probably more that could be improved in your code, but that's outside of the scope of the question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With